SlideShare une entreprise Scribd logo
1  sur  33
Télécharger pour lire hors ligne
Introduction to
Active Learning

Alexey Voropaev
Mail.Ru have own Search Engine

    – It is not «patched Google»

    – Completely out product

    – 8.3% of market

    - The same engine for:
        - web
        - image
        - video
        - news
        - real time
        - e-mail search
        - etc.


                                     2
1
W
    We love Machine Learning :)

    Simplified Search Engine Architecture:

                              ML
                   Crawler         – StaticRank: crawling priority
     Web
                              ML
                   Analyzer        – Antispam     – Object extraction
                                   – Antiporn     – User behaviour

                   Indexer

                              ML
                                 – Ranking
                  Searchers
                                 – Lingustic
                              ML – Query processing
                   Frontend        – Antirobot
                                                                        3
                                   – Decision when to show vertical search
2
What is Supervised Machine Learning?



      «Machine learning is the hot new thing»
                                   John Hennessy, President, Stanford




                                                                   4
3
Supervised Machine Learning


     There is unknown function



     Given:
         Set of samples



     Goal:
          Build approximation     of   using



                                               5
4
Take good features




           A — impossible               B — possible

                         Lenear model
                                                       6
5
Proper model selection




                                                                                                             7
                             Image from: Christopher M. Bishop. Pattern Recognition and Machine Learning. Springer. 2006
6
Importance of train set construction




      Usually:
        - random sampling is not the best strategy
        - labeling is very expensive                 8
7
Information about train set construction


    Modern books about ML:

     - Pattern Recognition and Machine Learning — 0
     - Elements of Statistical Learning — 0
     - An Introduction to Information Retrieval — 1 paragraph


    Book about TS construction:

    Burr Settles. Active learning.
       - Publication Date: 2 July 2012



                                                          9
8
Idea of Active Learning




                              Image from: Burr Settles. Active Learning Literature Survey. Computer
                                                                                         10
                              Sciences Technical Report 1648, University of Wisconsin–Madison. 2009.
9
Active learning workflow


 Main assumption: obtaining an unlabeled instance is free




                                 Image from: Burr Settles. Active Learning Literature Survey. Computer
                                                                                             11
                                 Sciences Technical Report 1648, University of Wisconsin–Madison. 2009.
10
Sampling scenarios




                      Image from: Burr Settles. Active Learning Literature Survey. Computer
                                                                                  12
                      Sciences Technical Report 1648, University of Wisconsin–Madison. 2009.
11
Uncertainty Sampling


 Take instances about which it is least certain how to label.




     Least confident:

     Margin sampling


     Entropy


                                                          13
12
Query-By-Committee




                      14
13
Query-By-Committee




                      15
14
Query-By-Committee




                      16
15
Query-By-Committee

     Measure of committee disagreement:



      Vote entropy



      Kullback-Leibler (KL)
      divergence:




                                          17
16
Query-By-Committee (QBag)

     Input: T – labeled train set
            С – size of the committe
            A – learning algorithm
            U – set of unlabeled objects

     1. Uniformly resample T, obtain T1...TС, where |Ti| < |T|
     2. For each Ti build model Hi using A
     3. Select x* = min x∈U | |Hi(x)=1| - |Hi(x)=0| |
     4. Pass x* to oracle and update T
     5. Repeat from 1until convergence



                                                                 18
17
QBag: Quality




                                                                               19
                 K.Dwyer, R.Holte, Decision Tree Instability and Active Learning, 2007
18
QBag: Stability




                                                                                 20
                   K.Dwyer, R.Holte, Decision Tree Instability and Active Learning, 2007
19
Density-Weighted Methods

 Idea: Inhabit dense/sparse regions of the input space




                                                         21
20
Other methods




     Expected Model Change

     Expected Error Reduction

     Variance Reduction

     Many «model dependent» methods




                                      22
21
Example 1:
 Recognition of sentence boundaries

     «I recommend Mitchell (1997) or Duda et al. (2001). I have strived...»


      Idea:

       1. Make set of simple regexp-based rules like
       «big letter at the right» or
         «digits at the rigth and left»
       (40 rules)

       2. Build classifier:
        Gradient Boosted Decision Trees


      A.Kudinov, A.Voropaev, A.Kalinin. A HIGHLY PRECISIONAL METHOD FOR THE
      RECOGNITION OF SENTENCE BOUNDARIES. Computational Linguistics and Intellectual
      Technologies. 2011                                                     23
22
Example 1:
 Recognition of sentence boundaries




             OpenNLP / «AOT»
                        project
                                      Mail.Ru
                                      detector /   Mail.Ru
             Sentence                 Random       detector /
             Boundary                 sampling     least
             Detector /                            confident
             Random
             sampling


     Error   41,2 %      30,4 %       8.2 %        0,8 %
     Rate




     Train set: 9820 examples
     Validation: 500 examples                                   24
23
Example 2:
 Ranking formula
     Idea:
         1. Query-Document presented by x, where:
             x1 – tf-idf rank
             x2 – query geographical
             x3 – geo-region of user is equal to document's region
             (600 other factors)

         2. Build train set:
             5 – vital
             4 – exact answer
             3 – usefull
             2 – slightly usefull
             1 – out of topic
             0 – can't be labeled (unknown language, etc.)

         3. Train ranking formula using modified LambdaRank


                                                                     25
24
Example 2:
 Self-organizing map

 Idea: mapping from N-dimentional to 2-dimentional




                                                     26
25
Example 2:
 Map of train set for ranking




                                27
26
Example 2:
 Density of QDocuments on the map




                                    High density
                                    Low density




                                               28
27
Example 2:
 Result of “SOM miss” selection




                                  Old documents
                                  New documents




                                             29
28
Example 2:
 Qbag + SOM heuristic

     1. Transform regression to binary classification problem:
         - We should order pair of QDocuments
         - Apply QBag and select pairs of QDocuments

     2. Apply SOM heuristic
         - Construct initial trainset T
         - Filter selected pairs




                                          A) Don't take pair
                                          B) May be
                                          C) Take pair

                                                                 30
29
Example 2:
 Qbag + SOM heuristic: results




                                 31
30
Example 3,4:
 Antispam, Antiporn

 Antispam
    – Neuron net
    – SOM balancing




 Antiporn
    – Decision trees
    – Uncertanty sampling




                            32
31
Reference




                   Thank you!

     Reference:

       - http://active-learning.net/

       - Burr Settles. Active Learning Literature Survey.
         Computer Sciences Technical Report 1648,
         University of Wisconsin–Madison. 2009.
                                                            33
32

Contenu connexe

Tendances

Deep Learning
Deep LearningDeep Learning
Deep LearningJun Wang
 
Deep Learning And Business Models (VNITC 2015-09-13)
Deep Learning And Business Models (VNITC 2015-09-13)Deep Learning And Business Models (VNITC 2015-09-13)
Deep Learning And Business Models (VNITC 2015-09-13)Ha Phuong
 
An introduction to Machine Learning (and a little bit of Deep Learning)
An introduction to Machine Learning (and a little bit of Deep Learning)An introduction to Machine Learning (and a little bit of Deep Learning)
An introduction to Machine Learning (and a little bit of Deep Learning)Thomas da Silva Paula
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Universitat Politècnica de Catalunya
 
Deep Learning Cases: Text and Image Processing
Deep Learning Cases: Text and Image ProcessingDeep Learning Cases: Text and Image Processing
Deep Learning Cases: Text and Image ProcessingGrigory Sapunov
 
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用SSII
 
HTM Spatial Pooler
HTM Spatial PoolerHTM Spatial Pooler
HTM Spatial PoolerNumenta
 
Deep Learning for Computer Vision (1/4): Image Analytics @ laSalle 2016
Deep Learning for Computer Vision (1/4): Image Analytics @ laSalle 2016Deep Learning for Computer Vision (1/4): Image Analytics @ laSalle 2016
Deep Learning for Computer Vision (1/4): Image Analytics @ laSalle 2016Universitat Politècnica de Catalunya
 
Yann le cun
Yann le cunYann le cun
Yann le cunYandex
 
Promises of Deep Learning
Promises of Deep LearningPromises of Deep Learning
Promises of Deep LearningDavid Khosid
 
Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016Grigory Sapunov
 
Software tookits for machine learning and graphical models
Software tookits for machine learning and graphical modelsSoftware tookits for machine learning and graphical models
Software tookits for machine learning and graphical modelsbutest
 
An Introduction to Deep Learning (April 2018)
An Introduction to Deep Learning (April 2018)An Introduction to Deep Learning (April 2018)
An Introduction to Deep Learning (April 2018)Julien SIMON
 
Intro To Convolutional Neural Networks
Intro To Convolutional Neural NetworksIntro To Convolutional Neural Networks
Intro To Convolutional Neural NetworksMark Scully
 
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...Greg Makowski
 
Deep Learning - A Literature survey
Deep Learning - A Literature surveyDeep Learning - A Literature survey
Deep Learning - A Literature surveyAkshay Hegde
 

Tendances (20)

Deep Learning
Deep LearningDeep Learning
Deep Learning
 
Deep Learning And Business Models (VNITC 2015-09-13)
Deep Learning And Business Models (VNITC 2015-09-13)Deep Learning And Business Models (VNITC 2015-09-13)
Deep Learning And Business Models (VNITC 2015-09-13)
 
An introduction to Machine Learning (and a little bit of Deep Learning)
An introduction to Machine Learning (and a little bit of Deep Learning)An introduction to Machine Learning (and a little bit of Deep Learning)
An introduction to Machine Learning (and a little bit of Deep Learning)
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
 
Deep Learning Cases: Text and Image Processing
Deep Learning Cases: Text and Image ProcessingDeep Learning Cases: Text and Image Processing
Deep Learning Cases: Text and Image Processing
 
Analyse de sentiment et classification par approche neuronale en Python et Weka
Analyse de sentiment et classification par approche neuronale en Python et WekaAnalyse de sentiment et classification par approche neuronale en Python et Weka
Analyse de sentiment et classification par approche neuronale en Python et Weka
 
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用
 
HTM Spatial Pooler
HTM Spatial PoolerHTM Spatial Pooler
HTM Spatial Pooler
 
Deep Learning for Computer Vision (1/4): Image Analytics @ laSalle 2016
Deep Learning for Computer Vision (1/4): Image Analytics @ laSalle 2016Deep Learning for Computer Vision (1/4): Image Analytics @ laSalle 2016
Deep Learning for Computer Vision (1/4): Image Analytics @ laSalle 2016
 
Yann le cun
Yann le cunYann le cun
Yann le cun
 
Promises of Deep Learning
Promises of Deep LearningPromises of Deep Learning
Promises of Deep Learning
 
Recommandation sociale : filtrage collaboratif et par le contenu
Recommandation sociale : filtrage collaboratif et par le contenuRecommandation sociale : filtrage collaboratif et par le contenu
Recommandation sociale : filtrage collaboratif et par le contenu
 
Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016
 
Tutorial on Deep Learning
Tutorial on Deep LearningTutorial on Deep Learning
Tutorial on Deep Learning
 
Software tookits for machine learning and graphical models
Software tookits for machine learning and graphical modelsSoftware tookits for machine learning and graphical models
Software tookits for machine learning and graphical models
 
An Introduction to Deep Learning (April 2018)
An Introduction to Deep Learning (April 2018)An Introduction to Deep Learning (April 2018)
An Introduction to Deep Learning (April 2018)
 
Intro To Convolutional Neural Networks
Intro To Convolutional Neural NetworksIntro To Convolutional Neural Networks
Intro To Convolutional Neural Networks
 
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...
 
Deep Learning - A Literature survey
Deep Learning - A Literature surveyDeep Learning - A Literature survey
Deep Learning - A Literature survey
 
Learning where to look: focus and attention in deep vision
Learning where to look: focus and attention in deep visionLearning where to look: focus and attention in deep vision
Learning where to look: focus and attention in deep vision
 

Similaire à Introduction to active learning

Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273Abutest
 
Presentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data MiningPresentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data Miningbutest
 
TensorFlow London: Cutting edge generative models
TensorFlow London: Cutting edge generative modelsTensorFlow London: Cutting edge generative models
TensorFlow London: Cutting edge generative modelsSeldon
 
TMS workshop on machine learning in materials science: Intro to deep learning...
TMS workshop on machine learning in materials science: Intro to deep learning...TMS workshop on machine learning in materials science: Intro to deep learning...
TMS workshop on machine learning in materials science: Intro to deep learning...BrianDeCost
 
The Concurrent Constraint Programming Research Programmes -- Redux
The Concurrent Constraint Programming Research Programmes -- ReduxThe Concurrent Constraint Programming Research Programmes -- Redux
The Concurrent Constraint Programming Research Programmes -- ReduxPierre Schaus
 
Probablistic information retrieval
Probablistic information retrievalProbablistic information retrieval
Probablistic information retrievalNisha Arankandath
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for ScienceIan Foster
 
Xin Yao: "What can evolutionary computation do for you?"
Xin Yao: "What can evolutionary computation do for you?"Xin Yao: "What can evolutionary computation do for you?"
Xin Yao: "What can evolutionary computation do for you?"ieee_cis_cyprus
 
ML Basic Concepts.pdf
ML Basic Concepts.pdfML Basic Concepts.pdf
ML Basic Concepts.pdfManishaS49
 
Introduction to the Artificial Intelligence and Computer Vision revolution
Introduction to the Artificial Intelligence and Computer Vision revolutionIntroduction to the Artificial Intelligence and Computer Vision revolution
Introduction to the Artificial Intelligence and Computer Vision revolutionDarian Frajberg
 
Using MongoDB for Materials Discovery
Using MongoDB for Materials DiscoveryUsing MongoDB for Materials Discovery
Using MongoDB for Materials DiscoveryDan Gunter
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningSujit Pal
 
Large Scale Data Clustering: an overview
Large Scale Data Clustering: an overviewLarge Scale Data Clustering: an overview
Large Scale Data Clustering: an overviewVahid Mirjalili
 
Artificial Intelligence and its application
Artificial Intelligence and its applicationArtificial Intelligence and its application
Artificial Intelligence and its applicationFELICIALILIANJ
 
Deep learning: Cutting through the Myths and Hype
Deep learning: Cutting through the Myths and HypeDeep learning: Cutting through the Myths and Hype
Deep learning: Cutting through the Myths and HypeSiby Jose Plathottam
 
Applied Machine Learning
Applied Machine LearningApplied Machine Learning
Applied Machine Learningbutest
 

Similaire à Introduction to active learning (20)

Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
 
Presentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data MiningPresentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data Mining
 
TensorFlow London: Cutting edge generative models
TensorFlow London: Cutting edge generative modelsTensorFlow London: Cutting edge generative models
TensorFlow London: Cutting edge generative models
 
TMS workshop on machine learning in materials science: Intro to deep learning...
TMS workshop on machine learning in materials science: Intro to deep learning...TMS workshop on machine learning in materials science: Intro to deep learning...
TMS workshop on machine learning in materials science: Intro to deep learning...
 
The Concurrent Constraint Programming Research Programmes -- Redux
The Concurrent Constraint Programming Research Programmes -- ReduxThe Concurrent Constraint Programming Research Programmes -- Redux
The Concurrent Constraint Programming Research Programmes -- Redux
 
20181212 ibm aot
20181212 ibm aot20181212 ibm aot
20181212 ibm aot
 
Probablistic information retrieval
Probablistic information retrievalProbablistic information retrieval
Probablistic information retrieval
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
 
Xin Yao: "What can evolutionary computation do for you?"
Xin Yao: "What can evolutionary computation do for you?"Xin Yao: "What can evolutionary computation do for you?"
Xin Yao: "What can evolutionary computation do for you?"
 
ML Basic Concepts.pdf
ML Basic Concepts.pdfML Basic Concepts.pdf
ML Basic Concepts.pdf
 
Introduction to the Artificial Intelligence and Computer Vision revolution
Introduction to the Artificial Intelligence and Computer Vision revolutionIntroduction to the Artificial Intelligence and Computer Vision revolution
Introduction to the Artificial Intelligence and Computer Vision revolution
 
AI Presentation 1
AI Presentation 1AI Presentation 1
AI Presentation 1
 
ProjectReport
ProjectReportProjectReport
ProjectReport
 
Using MongoDB for Materials Discovery
Using MongoDB for Materials DiscoveryUsing MongoDB for Materials Discovery
Using MongoDB for Materials Discovery
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
 
Large Scale Data Clustering: an overview
Large Scale Data Clustering: an overviewLarge Scale Data Clustering: an overview
Large Scale Data Clustering: an overview
 
Artificial Intelligence and its application
Artificial Intelligence and its applicationArtificial Intelligence and its application
Artificial Intelligence and its application
 
Lecture 01 Data Mining
Lecture 01 Data MiningLecture 01 Data Mining
Lecture 01 Data Mining
 
Deep learning: Cutting through the Myths and Hype
Deep learning: Cutting through the Myths and HypeDeep learning: Cutting through the Myths and Hype
Deep learning: Cutting through the Myths and Hype
 
Applied Machine Learning
Applied Machine LearningApplied Machine Learning
Applied Machine Learning
 

Introduction to active learning

  • 2. Mail.Ru have own Search Engine – It is not «patched Google» – Completely out product – 8.3% of market - The same engine for: - web - image - video - news - real time - e-mail search - etc. 2 1
  • 3. W We love Machine Learning :) Simplified Search Engine Architecture: ML Crawler – StaticRank: crawling priority Web ML Analyzer – Antispam – Object extraction – Antiporn – User behaviour Indexer ML – Ranking Searchers – Lingustic ML – Query processing Frontend – Antirobot 3 – Decision when to show vertical search 2
  • 4. What is Supervised Machine Learning? «Machine learning is the hot new thing» John Hennessy, President, Stanford 4 3
  • 5. Supervised Machine Learning There is unknown function Given: Set of samples Goal: Build approximation of using 5 4
  • 6. Take good features A — impossible B — possible Lenear model 6 5
  • 7. Proper model selection 7 Image from: Christopher M. Bishop. Pattern Recognition and Machine Learning. Springer. 2006 6
  • 8. Importance of train set construction Usually: - random sampling is not the best strategy - labeling is very expensive 8 7
  • 9. Information about train set construction Modern books about ML: - Pattern Recognition and Machine Learning — 0 - Elements of Statistical Learning — 0 - An Introduction to Information Retrieval — 1 paragraph Book about TS construction: Burr Settles. Active learning. - Publication Date: 2 July 2012 9 8
  • 10. Idea of Active Learning Image from: Burr Settles. Active Learning Literature Survey. Computer 10 Sciences Technical Report 1648, University of Wisconsin–Madison. 2009. 9
  • 11. Active learning workflow Main assumption: obtaining an unlabeled instance is free Image from: Burr Settles. Active Learning Literature Survey. Computer 11 Sciences Technical Report 1648, University of Wisconsin–Madison. 2009. 10
  • 12. Sampling scenarios Image from: Burr Settles. Active Learning Literature Survey. Computer 12 Sciences Technical Report 1648, University of Wisconsin–Madison. 2009. 11
  • 13. Uncertainty Sampling Take instances about which it is least certain how to label. Least confident: Margin sampling Entropy 13 12
  • 17. Query-By-Committee Measure of committee disagreement: Vote entropy Kullback-Leibler (KL) divergence: 17 16
  • 18. Query-By-Committee (QBag) Input: T – labeled train set С – size of the committe A – learning algorithm U – set of unlabeled objects 1. Uniformly resample T, obtain T1...TС, where |Ti| < |T| 2. For each Ti build model Hi using A 3. Select x* = min x∈U | |Hi(x)=1| - |Hi(x)=0| | 4. Pass x* to oracle and update T 5. Repeat from 1until convergence 18 17
  • 19. QBag: Quality 19 K.Dwyer, R.Holte, Decision Tree Instability and Active Learning, 2007 18
  • 20. QBag: Stability 20 K.Dwyer, R.Holte, Decision Tree Instability and Active Learning, 2007 19
  • 21. Density-Weighted Methods Idea: Inhabit dense/sparse regions of the input space 21 20
  • 22. Other methods Expected Model Change Expected Error Reduction Variance Reduction Many «model dependent» methods 22 21
  • 23. Example 1: Recognition of sentence boundaries «I recommend Mitchell (1997) or Duda et al. (2001). I have strived...» Idea: 1. Make set of simple regexp-based rules like «big letter at the right» or «digits at the rigth and left» (40 rules) 2. Build classifier: Gradient Boosted Decision Trees A.Kudinov, A.Voropaev, A.Kalinin. A HIGHLY PRECISIONAL METHOD FOR THE RECOGNITION OF SENTENCE BOUNDARIES. Computational Linguistics and Intellectual Technologies. 2011 23 22
  • 24. Example 1: Recognition of sentence boundaries OpenNLP / «AOT» project Mail.Ru detector / Mail.Ru Sentence Random detector / Boundary sampling least Detector / confident Random sampling Error 41,2 % 30,4 % 8.2 % 0,8 % Rate Train set: 9820 examples Validation: 500 examples 24 23
  • 25. Example 2: Ranking formula Idea: 1. Query-Document presented by x, where: x1 – tf-idf rank x2 – query geographical x3 – geo-region of user is equal to document's region (600 other factors) 2. Build train set: 5 – vital 4 – exact answer 3 – usefull 2 – slightly usefull 1 – out of topic 0 – can't be labeled (unknown language, etc.) 3. Train ranking formula using modified LambdaRank 25 24
  • 26. Example 2: Self-organizing map Idea: mapping from N-dimentional to 2-dimentional 26 25
  • 27. Example 2: Map of train set for ranking 27 26
  • 28. Example 2: Density of QDocuments on the map High density Low density 28 27
  • 29. Example 2: Result of “SOM miss” selection Old documents New documents 29 28
  • 30. Example 2: Qbag + SOM heuristic 1. Transform regression to binary classification problem: - We should order pair of QDocuments - Apply QBag and select pairs of QDocuments 2. Apply SOM heuristic - Construct initial trainset T - Filter selected pairs A) Don't take pair B) May be C) Take pair 30 29
  • 31. Example 2: Qbag + SOM heuristic: results 31 30
  • 32. Example 3,4: Antispam, Antiporn Antispam – Neuron net – SOM balancing Antiporn – Decision trees – Uncertanty sampling 32 31
  • 33. Reference Thank you! Reference: - http://active-learning.net/ - Burr Settles. Active Learning Literature Survey. Computer Sciences Technical Report 1648, University of Wisconsin–Madison. 2009. 33 32