SlideShare une entreprise Scribd logo
1  sur  11
Télécharger pour lire hors ligne
Topic Modeling
By
LDA
Laten Dirichlet Allocation
Topic Modeling
Topic modeling: is technique to uncover the underlying topic from the
document, in simple words it helps to identify what the document is
talking about, the important topics in the article.
Types of Topic Models
1) Latent Semantic Indexing (LSI)
2) Laten Dirichlet Allocation (LDA)
3) Probalistic Latent Semantic Indexing (PLSI)
Document  topic  words
Rupak Roy
Topic Modeling - LDA
Topics Technology Healthcare Business
%topics in the
documents 30 % 60% 17%
Bag of words Google, Dell Radiology, Transactions,
Apple, Microsoft Diagnose Bank, Cost
DOCUMENT
Behind LDA
Topic 1: Technology: Google, Dell, Apple, Microsoft
Topic 2: Healthcare: Radiology, Diagnose, Ct Scan
Topic 3: Business: Transactions, Banks, Cost.
Rupak Roy
Topic Modeling
How often does “Diagnose appear in topic Healthcare ?
If the ‘Diagnose’ word often occurs in the Topic Healthcare, then this
instance of ‘Diagnose’ might belong to the topic Healthcare.
Now how common is the topic healthcare in the rest of the document?
This is actually similar to Bayes theorem.
To find the probability of possible topic T
Multiply the frequency of the word type W in T by the number of other
words in document D that already belong to T
Therefore the output is
The probability that this word came from topic T=>
=> P(TW,D) = )words W in the topic T/words in the document )* words
in D that belong to T
Rupak Roy
Topic Modeling - LDA
library(RTextTools)
library(topicmodels)
tweets<-read.csv(file.choose())
View(tweets)
names(tweets)
tweets1<-data.frame(tweets$text)
tweets1<-tweets[,c(6,11)]
names(tweets1)
dim(tweets1)
names(tweets1)[2]<-"tweets"
View(tweets1)
Rupak Roy
Topic Modeling - LDA
#Create a Document Term Matrix
matrix= create_matrix(cbind(as.vector(tweets1$airline),as.vector(tweets1$tweets)),
language="english",removeNumbers=TRUE, removePunctuation=TRUE,
removeSparseTerms=0,
removeStopwords=TRUE, stripWhitespace=TRUE, toLower=TRUE)
inspect(tweets.corpus[1:5])
#Choose the number of topics
k<- 15
#Split the Data into training and testing
#We will take a small subset of data
train <- matrix[1:500,]
test <- matrix[501:750,]
#train <- matrix[1:10248,]
#test <- matrix[10249:1460,]
Rupak Roy
Topic Modeling - LDA
#Build the model on train data
train.lda <- LDA(train,k)
topics<-get_topics(train.lda,5)
View(topics)
#by default it gives the highest topic with the document
terms<-get_terms(train.lda,5)
View(terms)
#by default it gives the most highly probable word in each topic
#Get the top topics
train.topics <- topics(train.lda)
#Test the model
test.topics <- posterior(train.lda,test)
test.topics$topics[1:10,1:15]
#[row, number of topics(upto 15topics)that is the value of K =15]
test.topics <- apply(test.topics$topics, 1, which.max)
#gives topic with highest probability
Rupak Roy
Topic Modeling - LDA
#Join the predicted Topic number to the original test Data
test1<-tweets[501:750,]
final<-data.frame(Title=test1$airline,Subject=test1$text,
Pred.topic=test.topics)
View(final)
table(final$Pred.topic)
#View each topic
View(final[final$Pred.topic==10,])
Rupak Roy
Topic Modeling - LDA
#---------------Another method to get the optimal number of topics ---------#
library(topicmodel)
best.model <- lapply(seq(2,20, by=1), function(k){LDA(matrix,k)})
#seq(2,20) refers range of K values
best_model<- as.data.frame(as.matrix(lapply(best.model, logLik)))
#one of the methods to measure the performance is loglikehood & to find out
#whether a model is good model or average model or bad model based on the
parameter model uses.
final_best_model <- data.frame(topics=c(seq(2,20, by=1)),
log_likelihood=as.numeric(as.matrix(best_model)))
#The higher the loglikelihood the better the model.
#finds out ideal topic for every doc
head(final_best_model)
library(ggplot2)
with(final_best_model,qplot(topics,log_likelihood,color="red"))
#the higher the likelihood value in the graph better the topics are.
Rupak Roy
Topic Modeling - LDA
#Get the best value from the graph
k=final_best_model[which.max(final_best_model$log_likelihood),1]
cat("Best topic number k=",k)
Rupak Roy
Steps Topic Modeling
1) Data
2) Create TDM
3) Choose number of topics (K)
4) Divide the data into train & test
5) Building model on train data
6) Get the topic
7) Test the model
8) Joining the predicted Topic Number to the original dataset
9) Analyize
Rupak Roy

Contenu connexe

Tendances

Machine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree LearningMachine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree Learning
butest
 
Pattern matching
Pattern matchingPattern matching
Pattern matching
shravs_188
 

Tendances (20)

NAMED ENTITY RECOGNITION
NAMED ENTITY RECOGNITIONNAMED ENTITY RECOGNITION
NAMED ENTITY RECOGNITION
 
NLTK
NLTKNLTK
NLTK
 
Lecture 6
Lecture 6Lecture 6
Lecture 6
 
Text categorization
Text categorizationText categorization
Text categorization
 
Nlp toolkits and_preprocessing_techniques
Nlp toolkits and_preprocessing_techniquesNlp toolkits and_preprocessing_techniques
Nlp toolkits and_preprocessing_techniques
 
Word2Vec
Word2VecWord2Vec
Word2Vec
 
Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)
 
Vector space model in information retrieval
Vector space model in information retrievalVector space model in information retrieval
Vector space model in information retrieval
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Natural Language Toolkit (NLTK), Basics
Natural Language Toolkit (NLTK), Basics Natural Language Toolkit (NLTK), Basics
Natural Language Toolkit (NLTK), Basics
 
Text similarity measures
Text similarity measuresText similarity measures
Text similarity measures
 
Machine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree LearningMachine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree Learning
 
What is word2vec?
What is word2vec?What is word2vec?
What is word2vec?
 
Pattern matching
Pattern matchingPattern matching
Pattern matching
 
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information Retrieval
 
The vector space model
The vector space modelThe vector space model
The vector space model
 
Semantic Web, Ontology, and Ontology Learning: Introduction
Semantic Web, Ontology, and Ontology Learning: IntroductionSemantic Web, Ontology, and Ontology Learning: Introduction
Semantic Web, Ontology, and Ontology Learning: Introduction
 
Tutorial on Question Answering Systems
Tutorial on Question Answering Systems Tutorial on Question Answering Systems
Tutorial on Question Answering Systems
 
Word2Vec: Vector presentation of words - Mohammad Mahdavi
Word2Vec: Vector presentation of words - Mohammad MahdaviWord2Vec: Vector presentation of words - Mohammad Mahdavi
Word2Vec: Vector presentation of words - Mohammad Mahdavi
 
Text Similarity
Text SimilarityText Similarity
Text Similarity
 

Similaire à Topic Modeling - NLP

Discovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender SystemsDiscovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender Systems
Gabriel Moreira
 
DeepSearch_Project_Report
DeepSearch_Project_ReportDeepSearch_Project_Report
DeepSearch_Project_Report
Urjit Patel
 
Text Analysis: Latent Topics and Annotated Documents
Text Analysis: Latent Topics and Annotated DocumentsText Analysis: Latent Topics and Annotated Documents
Text Analysis: Latent Topics and Annotated Documents
Nelson Auner
 
Slides
SlidesSlides
Slides
butest
 
GLA-01- Java- Big O and Lists Overview and Submission Requirements You.pdf
GLA-01- Java- Big O and Lists Overview and Submission Requirements You.pdfGLA-01- Java- Big O and Lists Overview and Submission Requirements You.pdf
GLA-01- Java- Big O and Lists Overview and Submission Requirements You.pdf
NicholasflqStewartl
 

Similaire à Topic Modeling - NLP (20)

A-Study_TopicModeling
A-Study_TopicModelingA-Study_TopicModeling
A-Study_TopicModeling
 
A Text Mining Research Based on LDA Topic Modelling
A Text Mining Research Based on LDA Topic ModellingA Text Mining Research Based on LDA Topic Modelling
A Text Mining Research Based on LDA Topic Modelling
 
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLINGA TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
 
Wikipedia Document Classification
Wikipedia Document Classification Wikipedia Document Classification
Wikipedia Document Classification
 
A Document Exploring System on LDA Topic Model for Wikipedia Articles
A Document Exploring System on LDA Topic Model for Wikipedia ArticlesA Document Exploring System on LDA Topic Model for Wikipedia Articles
A Document Exploring System on LDA Topic Model for Wikipedia Articles
 
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and VocabulariesHaystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
 
Automated Software Requirements Labeling
Automated Software Requirements LabelingAutomated Software Requirements Labeling
Automated Software Requirements Labeling
 
Frontiers of Computational Journalism week 2 - Text Analysis
Frontiers of Computational Journalism week 2 - Text AnalysisFrontiers of Computational Journalism week 2 - Text Analysis
Frontiers of Computational Journalism week 2 - Text Analysis
 
Discovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender SystemsDiscovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender Systems
 
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
 
Segmentation
SegmentationSegmentation
Segmentation
 
Smai Project: Topic Modelling
Smai Project: Topic ModellingSmai Project: Topic Modelling
Smai Project: Topic Modelling
 
DeepSearch_Project_Report
DeepSearch_Project_ReportDeepSearch_Project_Report
DeepSearch_Project_Report
 
SpeakerLDA: Discovering Topics in Transcribed Multi-Speaker Audio Contents @ ...
SpeakerLDA: Discovering Topics in Transcribed Multi-Speaker Audio Contents @ ...SpeakerLDA: Discovering Topics in Transcribed Multi-Speaker Audio Contents @ ...
SpeakerLDA: Discovering Topics in Transcribed Multi-Speaker Audio Contents @ ...
 
LLMs in Production: Tooling, Process, and Team Structure
LLMs in Production: Tooling, Process, and Team StructureLLMs in Production: Tooling, Process, and Team Structure
LLMs in Production: Tooling, Process, and Team Structure
 
Text Analysis: Latent Topics and Annotated Documents
Text Analysis: Latent Topics and Annotated DocumentsText Analysis: Latent Topics and Annotated Documents
Text Analysis: Latent Topics and Annotated Documents
 
Text Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter DataText Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter Data
 
Slides
SlidesSlides
Slides
 
Tensors for topic modeling and deep learning on AWS Sagemaker
Tensors for topic modeling and deep learning on AWS SagemakerTensors for topic modeling and deep learning on AWS Sagemaker
Tensors for topic modeling and deep learning on AWS Sagemaker
 
GLA-01- Java- Big O and Lists Overview and Submission Requirements You.pdf
GLA-01- Java- Big O and Lists Overview and Submission Requirements You.pdfGLA-01- Java- Big O and Lists Overview and Submission Requirements You.pdf
GLA-01- Java- Big O and Lists Overview and Submission Requirements You.pdf
 

Plus de Rupak Roy

Plus de Rupak Roy (20)

Hierarchical Clustering - Text Mining/NLP
Hierarchical Clustering - Text Mining/NLPHierarchical Clustering - Text Mining/NLP
Hierarchical Clustering - Text Mining/NLP
 
Clustering K means and Hierarchical - NLP
Clustering K means and Hierarchical - NLPClustering K means and Hierarchical - NLP
Clustering K means and Hierarchical - NLP
 
Network Analysis - NLP
Network Analysis  - NLPNetwork Analysis  - NLP
Network Analysis - NLP
 
Sentiment Analysis Practical Steps
Sentiment Analysis Practical StepsSentiment Analysis Practical Steps
Sentiment Analysis Practical Steps
 
NLP - Sentiment Analysis
NLP - Sentiment AnalysisNLP - Sentiment Analysis
NLP - Sentiment Analysis
 
Text Mining using Regular Expressions
Text Mining using Regular ExpressionsText Mining using Regular Expressions
Text Mining using Regular Expressions
 
Introduction to Text Mining
Introduction to Text Mining Introduction to Text Mining
Introduction to Text Mining
 
Apache Hbase Architecture
Apache Hbase ArchitectureApache Hbase Architecture
Apache Hbase Architecture
 
Introduction to Hbase
Introduction to Hbase Introduction to Hbase
Introduction to Hbase
 
Apache Hive Table Partition and HQL
Apache Hive Table Partition and HQLApache Hive Table Partition and HQL
Apache Hive Table Partition and HQL
 
Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export
 
Introductive to Hive
Introductive to Hive Introductive to Hive
Introductive to Hive
 
Scoop Job, import and export to RDBMS
Scoop Job, import and export to RDBMSScoop Job, import and export to RDBMS
Scoop Job, import and export to RDBMS
 
Apache Scoop - Import with Append mode and Last Modified mode
Apache Scoop - Import with Append mode and Last Modified mode Apache Scoop - Import with Append mode and Last Modified mode
Apache Scoop - Import with Append mode and Last Modified mode
 
Introduction to scoop and its functions
Introduction to scoop and its functionsIntroduction to scoop and its functions
Introduction to scoop and its functions
 
Introduction to Flume
Introduction to FlumeIntroduction to Flume
Introduction to Flume
 
Apache Pig Relational Operators - II
Apache Pig Relational Operators - II Apache Pig Relational Operators - II
Apache Pig Relational Operators - II
 
Passing Parameters using File and Command Line
Passing Parameters using File and Command LinePassing Parameters using File and Command Line
Passing Parameters using File and Command Line
 
Apache PIG Relational Operations
Apache PIG Relational Operations Apache PIG Relational Operations
Apache PIG Relational Operations
 
Apache PIG casting, reference
Apache PIG casting, referenceApache PIG casting, reference
Apache PIG casting, reference
 

Dernier

Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
gajnagarg
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
HyderabadDolls
 

Dernier (20)

Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 

Topic Modeling - NLP

  • 2. Topic Modeling Topic modeling: is technique to uncover the underlying topic from the document, in simple words it helps to identify what the document is talking about, the important topics in the article. Types of Topic Models 1) Latent Semantic Indexing (LSI) 2) Laten Dirichlet Allocation (LDA) 3) Probalistic Latent Semantic Indexing (PLSI) Document  topic  words Rupak Roy
  • 3. Topic Modeling - LDA Topics Technology Healthcare Business %topics in the documents 30 % 60% 17% Bag of words Google, Dell Radiology, Transactions, Apple, Microsoft Diagnose Bank, Cost DOCUMENT Behind LDA Topic 1: Technology: Google, Dell, Apple, Microsoft Topic 2: Healthcare: Radiology, Diagnose, Ct Scan Topic 3: Business: Transactions, Banks, Cost. Rupak Roy
  • 4. Topic Modeling How often does “Diagnose appear in topic Healthcare ? If the ‘Diagnose’ word often occurs in the Topic Healthcare, then this instance of ‘Diagnose’ might belong to the topic Healthcare. Now how common is the topic healthcare in the rest of the document? This is actually similar to Bayes theorem. To find the probability of possible topic T Multiply the frequency of the word type W in T by the number of other words in document D that already belong to T Therefore the output is The probability that this word came from topic T=> => P(TW,D) = )words W in the topic T/words in the document )* words in D that belong to T Rupak Roy
  • 5. Topic Modeling - LDA library(RTextTools) library(topicmodels) tweets<-read.csv(file.choose()) View(tweets) names(tweets) tweets1<-data.frame(tweets$text) tweets1<-tweets[,c(6,11)] names(tweets1) dim(tweets1) names(tweets1)[2]<-"tweets" View(tweets1) Rupak Roy
  • 6. Topic Modeling - LDA #Create a Document Term Matrix matrix= create_matrix(cbind(as.vector(tweets1$airline),as.vector(tweets1$tweets)), language="english",removeNumbers=TRUE, removePunctuation=TRUE, removeSparseTerms=0, removeStopwords=TRUE, stripWhitespace=TRUE, toLower=TRUE) inspect(tweets.corpus[1:5]) #Choose the number of topics k<- 15 #Split the Data into training and testing #We will take a small subset of data train <- matrix[1:500,] test <- matrix[501:750,] #train <- matrix[1:10248,] #test <- matrix[10249:1460,] Rupak Roy
  • 7. Topic Modeling - LDA #Build the model on train data train.lda <- LDA(train,k) topics<-get_topics(train.lda,5) View(topics) #by default it gives the highest topic with the document terms<-get_terms(train.lda,5) View(terms) #by default it gives the most highly probable word in each topic #Get the top topics train.topics <- topics(train.lda) #Test the model test.topics <- posterior(train.lda,test) test.topics$topics[1:10,1:15] #[row, number of topics(upto 15topics)that is the value of K =15] test.topics <- apply(test.topics$topics, 1, which.max) #gives topic with highest probability Rupak Roy
  • 8. Topic Modeling - LDA #Join the predicted Topic number to the original test Data test1<-tweets[501:750,] final<-data.frame(Title=test1$airline,Subject=test1$text, Pred.topic=test.topics) View(final) table(final$Pred.topic) #View each topic View(final[final$Pred.topic==10,]) Rupak Roy
  • 9. Topic Modeling - LDA #---------------Another method to get the optimal number of topics ---------# library(topicmodel) best.model <- lapply(seq(2,20, by=1), function(k){LDA(matrix,k)}) #seq(2,20) refers range of K values best_model<- as.data.frame(as.matrix(lapply(best.model, logLik))) #one of the methods to measure the performance is loglikehood & to find out #whether a model is good model or average model or bad model based on the parameter model uses. final_best_model <- data.frame(topics=c(seq(2,20, by=1)), log_likelihood=as.numeric(as.matrix(best_model))) #The higher the loglikelihood the better the model. #finds out ideal topic for every doc head(final_best_model) library(ggplot2) with(final_best_model,qplot(topics,log_likelihood,color="red")) #the higher the likelihood value in the graph better the topics are. Rupak Roy
  • 10. Topic Modeling - LDA #Get the best value from the graph k=final_best_model[which.max(final_best_model$log_likelihood),1] cat("Best topic number k=",k) Rupak Roy
  • 11. Steps Topic Modeling 1) Data 2) Create TDM 3) Choose number of topics (K) 4) Divide the data into train & test 5) Building model on train data 6) Get the topic 7) Test the model 8) Joining the predicted Topic Number to the original dataset 9) Analyize Rupak Roy