SlideShare une entreprise Scribd logo
1  sur  28
A Survey on Text Categorization with Machine Learning Chikayama lab. Dai Saito
Introduction: Text Categorization ,[object Object],[object Object],[object Object],[object Object],[object Object]
Introduction: Text Categorization ,[object Object],[object Object],[object Object]
Introduction: Machine Learning ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Introduction: flow of ML ,[object Object],[object Object],[object Object],[object Object],Label1 Label2 ?
Outline ,[object Object],[object Object],[object Object],[object Object],[object Object]
Number of labels ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Yes No L1 L2 L3 L4 L1 L2 L3 L4
Types of labels ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Outline ,[object Object],[object Object],[object Object],[object Object],[object Object]
Feature of Text ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Preprocessing ,[object Object],[object Object],[object Object],[object Object]
Term Weighting ,[object Object],[object Object],[object Object],[object Object],[object Object]
Sentiment Weighting ,[object Object],[object Object],[object Object],[object Object],[object Object],d (good, happy) = 2 d (bad, happy) = 4 good bad happy
Dimension Reduction  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Dimension Reduction ,[object Object],[object Object],[object Object],[object Object],[object Object]
Outline ,[object Object],[object Object],[object Object],[object Object],[object Object]
Learning Algorithm ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Naïve Bayes ,[object Object],[object Object],[object Object],[object Object]
k-Nearest Neighbor ,[object Object],[object Object],[object Object],[object Object],d1 d2 θ k=3
Boosting ,[object Object],[object Object],[object Object],[object Object],[object Object]
Simple example of Boosting + + + + + - - - - - + + + + + - - - - - 1. - - + + + + + - - - 2. + + + + + - - - - - 3.
Support Vector Machine ,[object Object],[object Object]
Text Categorization with SVM ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Comparison of these methods ,[object Object],[object Object],[object Object],.920 .870 SVM Boosting Naïve Bayes k-NN Method .878 .795 .860 Ver.1(90) -  .815 .823 Ver.2(10)
Hierarchical Learning ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
TreeBoost root L1 L2 L3 L4 L11 L12 L41 L42 L43 L421 L422
Outline ,[object Object],[object Object],[object Object],[object Object],[object Object]
Conclusion ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Contenu connexe

Tendances

Nlp toolkits and_preprocessing_techniques
Nlp toolkits and_preprocessing_techniquesNlp toolkits and_preprocessing_techniques
Nlp toolkits and_preprocessing_techniquesankit_ppt
 
NLP State of the Art | BERT
NLP State of the Art | BERTNLP State of the Art | BERT
NLP State of the Art | BERTshaurya uppal
 
Text classification with fast text elena_meetup_milano_27_june
Text classification with fast text elena_meetup_milano_27_juneText classification with fast text elena_meetup_milano_27_june
Text classification with fast text elena_meetup_milano_27_juneDeep Learning Italia
 
Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Dev Sahu
 
Text categorization
Text categorizationText categorization
Text categorizationKU Leuven
 
Natural language processing (Python)
Natural language processing (Python)Natural language processing (Python)
Natural language processing (Python)Sumit Raj
 
Natural language processing: feature extraction
Natural language processing: feature extractionNatural language processing: feature extraction
Natural language processing: feature extractionGabriel Hamilton
 
Text classification using Text kernels
Text classification using Text kernelsText classification using Text kernels
Text classification using Text kernelsDev Nath
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsRoelof Pieters
 
An introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTAn introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTSuman Debnath
 
Feature Engineering for NLP
Feature Engineering for NLPFeature Engineering for NLP
Feature Engineering for NLPBill Liu
 
Deep Learning - Overview of my work II
Deep Learning - Overview of my work IIDeep Learning - Overview of my work II
Deep Learning - Overview of my work IIMohamed Loey
 
A Review of Deep Contextualized Word Representations (Peters+, 2018)
A Review of Deep Contextualized Word Representations (Peters+, 2018)A Review of Deep Contextualized Word Representations (Peters+, 2018)
A Review of Deep Contextualized Word Representations (Peters+, 2018)Shuntaro Yada
 
Classifying Text using CNN
Classifying Text using CNNClassifying Text using CNN
Classifying Text using CNNSomnath Banerjee
 
Information retrieval 10 tf idf and bag of words
Information retrieval 10 tf idf and bag of wordsInformation retrieval 10 tf idf and bag of words
Information retrieval 10 tf idf and bag of wordsVaibhav Khanna
 
Information Retrieval 02
Information Retrieval 02Information Retrieval 02
Information Retrieval 02Jeet Das
 

Tendances (20)

Nlp toolkits and_preprocessing_techniques
Nlp toolkits and_preprocessing_techniquesNlp toolkits and_preprocessing_techniques
Nlp toolkits and_preprocessing_techniques
 
NLP State of the Art | BERT
NLP State of the Art | BERTNLP State of the Art | BERT
NLP State of the Art | BERT
 
Text classification with fast text elena_meetup_milano_27_june
Text classification with fast text elena_meetup_milano_27_juneText classification with fast text elena_meetup_milano_27_june
Text classification with fast text elena_meetup_milano_27_june
 
Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier
 
Text categorization
Text categorizationText categorization
Text categorization
 
Natural language processing (Python)
Natural language processing (Python)Natural language processing (Python)
Natural language processing (Python)
 
Natural language processing: feature extraction
Natural language processing: feature extractionNatural language processing: feature extraction
Natural language processing: feature extraction
 
Text classification using Text kernels
Text classification using Text kernelsText classification using Text kernels
Text classification using Text kernels
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
 
An introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTAn introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERT
 
Feature Engineering for NLP
Feature Engineering for NLPFeature Engineering for NLP
Feature Engineering for NLP
 
NLTK
NLTKNLTK
NLTK
 
Language models
Language modelsLanguage models
Language models
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
 
Deep Learning - Overview of my work II
Deep Learning - Overview of my work IIDeep Learning - Overview of my work II
Deep Learning - Overview of my work II
 
A Review of Deep Contextualized Word Representations (Peters+, 2018)
A Review of Deep Contextualized Word Representations (Peters+, 2018)A Review of Deep Contextualized Word Representations (Peters+, 2018)
A Review of Deep Contextualized Word Representations (Peters+, 2018)
 
Classifying Text using CNN
Classifying Text using CNNClassifying Text using CNN
Classifying Text using CNN
 
Information retrieval 10 tf idf and bag of words
Information retrieval 10 tf idf and bag of wordsInformation retrieval 10 tf idf and bag of words
Information retrieval 10 tf idf and bag of words
 
BERT introduction
BERT introductionBERT introduction
BERT introduction
 
Information Retrieval 02
Information Retrieval 02Information Retrieval 02
Information Retrieval 02
 

En vedette

Text Categorization Using Improved K Nearest Neighbor Algorithm
Text Categorization Using Improved K Nearest Neighbor AlgorithmText Categorization Using Improved K Nearest Neighbor Algorithm
Text Categorization Using Improved K Nearest Neighbor AlgorithmIJTET Journal
 
Textmining Predictive Models
Textmining Predictive ModelsTextmining Predictive Models
Textmining Predictive Modelsguest0edcaf
 
Text categorization
Text categorizationText categorization
Text categorizationNguyen Quang
 
Text categorization using Rough Set
Text categorization using Rough SetText categorization using Rough Set
Text categorization using Rough SetSreekumar Biswas
 
Text Categorization
Text CategorizationText Categorization
Text Categorizationcympfh
 
20070702 Text Categorization
20070702 Text Categorization20070702 Text Categorization
20070702 Text Categorizationmidi
 
Text Classification/Categorization
Text Classification/CategorizationText Classification/Categorization
Text Classification/CategorizationOswal Abhishek
 
Text clustering
Text clusteringText clustering
Text clusteringKU Leuven
 
Text categorization with Lucene and Solr
Text categorization with Lucene and SolrText categorization with Lucene and Solr
Text categorization with Lucene and SolrTommaso Teofili
 
Introduction to text classification using naive bayes
Introduction to text classification using naive bayesIntroduction to text classification using naive bayes
Introduction to text classification using naive bayesDhwaj Raj
 

En vedette (11)

Text Categorization Using Improved K Nearest Neighbor Algorithm
Text Categorization Using Improved K Nearest Neighbor AlgorithmText Categorization Using Improved K Nearest Neighbor Algorithm
Text Categorization Using Improved K Nearest Neighbor Algorithm
 
Textmining Predictive Models
Textmining Predictive ModelsTextmining Predictive Models
Textmining Predictive Models
 
Text categorization
Text categorizationText categorization
Text categorization
 
Text categorization using Rough Set
Text categorization using Rough SetText categorization using Rough Set
Text categorization using Rough Set
 
Text Categorization
Text CategorizationText Categorization
Text Categorization
 
20070702 Text Categorization
20070702 Text Categorization20070702 Text Categorization
20070702 Text Categorization
 
Text categorization
Text categorizationText categorization
Text categorization
 
Text Classification/Categorization
Text Classification/CategorizationText Classification/Categorization
Text Classification/Categorization
 
Text clustering
Text clusteringText clustering
Text clustering
 
Text categorization with Lucene and Solr
Text categorization with Lucene and SolrText categorization with Lucene and Solr
Text categorization with Lucene and Solr
 
Introduction to text classification using naive bayes
Introduction to text classification using naive bayesIntroduction to text classification using naive bayes
Introduction to text classification using naive bayes
 

Similaire à [ppt]

A Survey Of Various Machine Learning Techniques For Text Classification
A Survey Of Various Machine Learning Techniques For Text ClassificationA Survey Of Various Machine Learning Techniques For Text Classification
A Survey Of Various Machine Learning Techniques For Text ClassificationJoshua Gorinson
 
Multi label classification of
Multi label classification ofMulti label classification of
Multi label classification ofijaia
 
Text Classification.pptx
Text Classification.pptxText Classification.pptx
Text Classification.pptxhezamgawbah
 
activelearning.ppt
activelearning.pptactivelearning.ppt
activelearning.pptbutest
 
Search Engines
Search EnginesSearch Engines
Search Enginesbutest
 
T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...
T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...
T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...csandit
 
pmuthoju_presentation.ppt
pmuthoju_presentation.pptpmuthoju_presentation.ppt
pmuthoju_presentation.pptbutest
 
DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCE
DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCEDETERMINING CUSTOMER SATISFACTION IN-ECOMMERCE
DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCEAbdurrahimDerric
 
IRJET- Multi Label Document Classification Approach using Machine Learning Te...
IRJET- Multi Label Document Classification Approach using Machine Learning Te...IRJET- Multi Label Document Classification Approach using Machine Learning Te...
IRJET- Multi Label Document Classification Approach using Machine Learning Te...IRJET Journal
 
Review of Various Text Categorization Methods
Review of Various Text Categorization MethodsReview of Various Text Categorization Methods
Review of Various Text Categorization Methodsiosrjce
 
Paper id 25201435
Paper id 25201435Paper id 25201435
Paper id 25201435IJRAT
 
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTEA FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTEijnlc
 
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTEA FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTEkevig
 
Machine_learning_presentation_on_movie_recomendation_system.pptx
Machine_learning_presentation_on_movie_recomendation_system.pptxMachine_learning_presentation_on_movie_recomendation_system.pptx
Machine_learning_presentation_on_movie_recomendation_system.pptxarunchoubeybxr
 
Team G
Team GTeam G
Team Gbutest
 
Learning from similarity and information extraction from structured documents...
Learning from similarity and information extraction from structured documents...Learning from similarity and information extraction from structured documents...
Learning from similarity and information extraction from structured documents...Infrrd
 
Part 1
Part 1Part 1
Part 1butest
 
Text Document Classification System
Text Document Classification SystemText Document Classification System
Text Document Classification SystemIRJET Journal
 
Mariia Havrylovych "Active learning and weak supervision in NLP projects"
Mariia Havrylovych "Active learning and weak supervision in NLP projects"Mariia Havrylovych "Active learning and weak supervision in NLP projects"
Mariia Havrylovych "Active learning and weak supervision in NLP projects"Fwdays
 

Similaire à [ppt] (20)

A Survey Of Various Machine Learning Techniques For Text Classification
A Survey Of Various Machine Learning Techniques For Text ClassificationA Survey Of Various Machine Learning Techniques For Text Classification
A Survey Of Various Machine Learning Techniques For Text Classification
 
Multi label classification of
Multi label classification ofMulti label classification of
Multi label classification of
 
Text Classification.pptx
Text Classification.pptxText Classification.pptx
Text Classification.pptx
 
activelearning.ppt
activelearning.pptactivelearning.ppt
activelearning.ppt
 
Search Engines
Search EnginesSearch Engines
Search Engines
 
T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...
T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...
T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...
 
pmuthoju_presentation.ppt
pmuthoju_presentation.pptpmuthoju_presentation.ppt
pmuthoju_presentation.ppt
 
DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCE
DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCEDETERMINING CUSTOMER SATISFACTION IN-ECOMMERCE
DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCE
 
IRJET- Multi Label Document Classification Approach using Machine Learning Te...
IRJET- Multi Label Document Classification Approach using Machine Learning Te...IRJET- Multi Label Document Classification Approach using Machine Learning Te...
IRJET- Multi Label Document Classification Approach using Machine Learning Te...
 
Review of Various Text Categorization Methods
Review of Various Text Categorization MethodsReview of Various Text Categorization Methods
Review of Various Text Categorization Methods
 
C017321319
C017321319C017321319
C017321319
 
Paper id 25201435
Paper id 25201435Paper id 25201435
Paper id 25201435
 
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTEA FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
 
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTEA FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
 
Machine_learning_presentation_on_movie_recomendation_system.pptx
Machine_learning_presentation_on_movie_recomendation_system.pptxMachine_learning_presentation_on_movie_recomendation_system.pptx
Machine_learning_presentation_on_movie_recomendation_system.pptx
 
Team G
Team GTeam G
Team G
 
Learning from similarity and information extraction from structured documents...
Learning from similarity and information extraction from structured documents...Learning from similarity and information extraction from structured documents...
Learning from similarity and information extraction from structured documents...
 
Part 1
Part 1Part 1
Part 1
 
Text Document Classification System
Text Document Classification SystemText Document Classification System
Text Document Classification System
 
Mariia Havrylovych "Active learning and weak supervision in NLP projects"
Mariia Havrylovych "Active learning and weak supervision in NLP projects"Mariia Havrylovych "Active learning and weak supervision in NLP projects"
Mariia Havrylovych "Active learning and weak supervision in NLP projects"
 

Plus de butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 

Plus de butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

[ppt]

  • 1. A Survey on Text Categorization with Machine Learning Chikayama lab. Dai Saito
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21. Simple example of Boosting + + + + + - - - - - + + + + + - - - - - 1. - - + + + + + - - - 2. + + + + + - - - - - 3.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26. TreeBoost root L1 L2 L3 L4 L11 L12 L41 L42 L43 L421 L422
  • 27.
  • 28.

Notes de l'éditeur

  1. インターネットの普及やコンピュータを用いた文書の電子化が進むにつれて、 メールやニュース、ブログ等、大量の電子化されたデータが入手可能となってきた。 それに従い、時間や人的コストの観点から、 人手を介さずに大量の文書を効率良く分類する必要が高まってきている。
  2. 例えばテキストを自動的にどのトピックに属するかを調べたり、 Webからの評判を抽出、といった応用が挙げられる。
  3. そこで、テキストを自動で分類するための手法として最も広く用いられているのが、 単語などのテキスト情報を元にした機械学習の手法である。 機械学習は広く分けて教師あり、教師無し、があるが、 本輪講では教師あり学習について述べる
  4. ここでテキスト分類における機械学習の主な流れを示す。 まず、自然言語で書かれたテキストを機械が扱えるような形に変換する。 (特徴抽出) そしてその特徴を用いて学習器で学習する。 (学習) 未知のデータが来た場合、訓練した学習器を元にデータを分類する。 (分類) このようにテキスト分類は一般で用いられる機械学習の流れとほぼ同じなため、 機械学習の分野で広く研究されている。 ここでは、このそれぞれの段階について用いられている手法の調査を行う。
  5. ここではテキストデータからの特徴抽出について説明する。 まず、自然言語で書かれたデータを形態素解析等を用いて何らかの数値データに変換する必要がある。
  6. この場合、例えば英語で言えばthe, for, 等の非常に頻繁に出てくる単語は「ストップワード」として 取り除かれる必要がある。
  7. まず最初に思いつく最も単純な方法として、各単語の出現回数を数える方法が考えられる。 文書数×単語数のベクトルを考え、どの文書にどの単語が何回出現するのか、を表す。 この場合、非常に単純にデータを扱うことが出来るが、出現回数のみを見ているのであまり精度が出ない
  8. ここで考えられるのが tf-idf 法である。 これは、(単語がある文書に出てくる頻度) × (単語が出てくる文書数の逆数)をとったもので、 文書に頻繁に出てきて、また全体ではあまり出てこない単語に高い重みがつくようになっており、 テキスト分類における特徴抽出の方法として広く用いられている。 基本的に文書の特徴は tf-idf か、あるいはこの値を正規化したものを用いることが 事実上標準となっており、新たな研究はあまり行われていない。
  9. 上のままだと文書を表すベクトルが文書数×辞書の単語数、とかなり大きくなってしまう。 そこで、この次元数を削減するために特徴選択が用いられる。
  10. ここで用いられているものは、まず一つは出現頻度に特定のスレッショルドを設けることである。 単語が出てくる文書数一定回以上出てない単語は学習に用いない。 これは、非常に少ない文書にしか出てこない単語は分類の役に立たないであろう、という推測に基づいている。