SlideShare une entreprise Scribd logo
1  sur  18
Revisiting Evolutionary Information Filtering
Nikolaos Nanas, Centre for Research and Technology Thessaly, GREECE
Stefanos Kodovas, University of Thessaly, GREECE
Manolis Vavalis, University of Thessaly, GREECE
outline
 Adaptive Information Filtering – brief introduction
 Evolutionary Information Filtering – review
 Diversity & dimensionality – theoretical issues
 Experimental evaluation
• Methodology – a test-bed
• Results – not a success
• Discussion – interesting observations
 Conclusions and future work
Information Overload is still around
Adaptive Information Filtering in the case of textual information
Adaptive Information Filtering (AIF)
 challenging problem with no established solution
 complex and dynamic
• multiple and changing user interests
• changing information environment
 crucial issues for successful AIF
• profile representationprofile representation
• profile adaptationprofile adaptation
Evolutionary Information Filtering with the Vector Space Model
Profile adaptation through evolution of user’s profiles.
Evolutionary Information Filtering
• “A Review of Evolutionary and Immune-Inspired Information Filtering”, Natural Computing, 2009
• A common vector space with as many dimensions as the number of unique keywords
• A population of profiles that collectively represent the user’s interests
• Both profiles and documents are represented as (weighted) vectors in this space
• Trigonometric measures of similarity for comparing profile vectors to document vectors
• Fitness function based on (explicit or implicit) user feedback
• reward profiles that assigned a high relevance score to relevant documents and vice versa
• fitness is updated proportional to user feedback
• average score of relevant documents
• ratio of successful evaluations
Evolutionary Information Filtering
 profile initialisation is not random
 selection
• fixed percentage of best individuals
• variable percentage
• roulette wheel
 crossover
• single-point, two-point, three-point
• variable percentage
• roulette wheel
 mutation
• keyword replacement
• random weight modification
 steady-space replacement
• offspring typically replace less fit individuals
Diversity Issues
 AIF is not a classic optimisation problem
• online learning problem
• reminiscent of Multimodal Dynamic Optimisation (MDO)
 Traditional GAs suffer in the case of MDO due to diversity loss.
 Four types of remedies:
1. adjust mutation rate when changes are observed
2. spread the population
3. memory of previous generations
4. multiple subpopulations
• in “Multimodal Dynamic Optimisation: from Evolutionary Algorithms to Artificial Immune Systems”, 2007
• intrinsic diversity problems due to:
• selection based on relative fitness
• no developmental process
• fixed population size
Dimensionality Issues
• A vector space with a large number of dimensions (keywords) is required for successful AIF
• In a multi-dimensional space:
• the volume increases exponentially with the number of dimensions
• distance based measures become meaningless as points become equidistant
• the discriminatory power of pair-wise distances is significantly affected
• scalar metrics can not differentiate between vectors with distributed and concentrated differences
• in a multi-dimensional keyword space the ability of GAs to achieve profile adaptation is affected because:
• the number of possible weighted keyword combinations increases exponentially with the number of dimensions
• crossover and mutation cannot randomly produce the right combination of weighted keywords
Experimental Evaluation: Dataset
 Reuters-21578
• 21578 news stories that appeared in Reuters newswire in 1987
• documents are ordered according to publication date
• 135 topic categories
• experiments concentrate on the 23 topics with at least 100 relevant documents
 document pre-processing
• stop word removal
• stemming with Porter’s algorithm
• weighting with Term Frequency Inverse Document Frequency (TFIDF)
 words with large average TFIDF are selected to build the keyword space
topic
code
size
earn 3987
acq 2448
money-
fx
801
crude 634
grain 628
trade 552
interest 513
wheat 306
ship 305
corn 254
dlr 217
oilseed 192
topic
code
size
money-
suply 190
sugar 184
gnp 163
coffee 145
veg-oil 137
gold 135
nat-gas 130
soybean 120
bop 116
livestock 114
cpi 112
Experimental Evaluation: Baseline Experiment
Baseline Results
• as the number of extracted words
increases the AUP values increase
• for a small number of extracted
keywords the results are biased
towards topics with a large number
of relevant documents
• the best results are achieved when
all extracted keywords are used
• if we wish to represent a range of
topics then a multidimensional
space is required
Experimental Evaluation: Evolutionary Experiments
 a vector space comprising 31298 keywords
 The basic Genetic Algorithm:
• with a population of 100 profiles
• each profile is a weighted keyword vector (randomly initialised)
• the same random initial population is used in all experiments
• documents are evaluated in order using the inner product
• new fitness = old fitness + relevance score
• the 25% fittest profiles are selected for reproduction
• single-point crossover
• mutation through random weight modification
• the offspring replace the 25% worst profiles
 two further variations of the basic GA
• GA_init: initialisation using the first 100 relevant documents per topic.
• GA_init + learning: a MA that uses Rocchio’s learning algorithm
Comparative Results:
accuracy
• y-axis: best AUP achieved in 50 generations
(bias)
• baseline results are included
• additional results for ranking by date
Findings:
• the GA performs worse than the baseline
• marginal improvements for non-random
initialisation
• significant improvement when learning is
introduced
• the MA is only better for some topics with
small size
Comparative Results:
learning
• y-axis: average AUP over all topics after each generation
• x-axis: number of generations
• embedded figure focuses on GA and GA_init
Findings:
• GA does not essentially improve
• better initial performance and learning rate for non
random initialisation (GA_init)
• much steeper learning curve when learning is
introduced (GA_init + learning).
Conclusions
 The basic GA fails to learn the topic of interest.
• the right combination of keyword weights can not be randomly produced.
• the GA is lacking a mechanism for appropriately updating keyword weights.
• performance depends on the weighted keywords that initialisation produced.
 When the GA is initialised based on relevant documents
• then the initial set of weighted keywords produces better filtering results
 The introduction of learning allows for further improvements in the initial keyword weights.
• still worse than the baseline experiment despite the 50 generations
• this is possibly due to the negative effect of the genetic operations
Discussion
 Our experimental results do not agree with the promising results reported in the literature
• we did not re-implement an existing approach, but adopted existing techniques
• AIF is a complex problem that can not be easily tackled with weighted keyword in a multi-dimensional space
• comparative experiments between GAs and other machine learning algorithms have been missing from AIF
 large differences observed between the GA and the baseline algorithm
• despite the biased comparison in favour of the GA
• more fundamental alternatives which are not based on vector representations
• the choice of representation should facilitate the learning task
• external remedies like those adopted for MDO are not practical
 we wish to reanimate the interest of the research community on AIF
• biologically inspired solutions are well suited to the problem
• appropriate experimental methodologies that reflect the complexity and dynamics of AIF are required

Contenu connexe

Tendances

A Model for Predicting Students’ Academic Performance using a Hybrid of K-mea...
A Model for Predicting Students’ Academic Performance using a Hybrid of K-mea...A Model for Predicting Students’ Academic Performance using a Hybrid of K-mea...
A Model for Predicting Students’ Academic Performance using a Hybrid of K-mea...Editor IJCATR
 
Category & Training Texts Selection for Scientific Article Categorization in ...
Category & Training Texts Selection for Scientific Article Categorization in ...Category & Training Texts Selection for Scientific Article Categorization in ...
Category & Training Texts Selection for Scientific Article Categorization in ...Gan Keng Hoon
 
Towards Privacy-Preserving Evaluation for Information Retrieval Models over I...
Towards Privacy-Preserving Evaluation for Information Retrieval Models over I...Towards Privacy-Preserving Evaluation for Information Retrieval Models over I...
Towards Privacy-Preserving Evaluation for Information Retrieval Models over I...Twitter Inc.
 
Predicting student performance using aggregated data sources
Predicting student performance using aggregated data sourcesPredicting student performance using aggregated data sources
Predicting student performance using aggregated data sourcesOlugbenga Wilson Adejo
 
Analysis of virtual labs - Paper presentation at ICALT 2018 (IIT Bombay)
Analysis of virtual labs - Paper presentation at ICALT 2018 (IIT Bombay)Analysis of virtual labs - Paper presentation at ICALT 2018 (IIT Bombay)
Analysis of virtual labs - Paper presentation at ICALT 2018 (IIT Bombay)Mrityunjay Kumar
 
Student Performance Evaluation in Education Sector Using Prediction and Clust...
Student Performance Evaluation in Education Sector Using Prediction and Clust...Student Performance Evaluation in Education Sector Using Prediction and Clust...
Student Performance Evaluation in Education Sector Using Prediction and Clust...IJSRD
 
Application of Higher Education System for Predicting Student Using Data mini...
Application of Higher Education System for Predicting Student Using Data mini...Application of Higher Education System for Predicting Student Using Data mini...
Application of Higher Education System for Predicting Student Using Data mini...AM Publications
 
Strategies for Metabolomics Data Analysis
Strategies for Metabolomics Data AnalysisStrategies for Metabolomics Data Analysis
Strategies for Metabolomics Data AnalysisDmitry Grapov
 
Data Normalization Approaches for Large-scale Biological Studies
Data Normalization Approaches for Large-scale Biological StudiesData Normalization Approaches for Large-scale Biological Studies
Data Normalization Approaches for Large-scale Biological StudiesDmitry Grapov
 
Prote-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationProte-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationDmitry Grapov
 
2022_03_28 EDUCON 2022 “Replication of an Evaluation of Teacher Training in t...
2022_03_28 EDUCON 2022 “Replication of an Evaluation of Teacher Training in t...2022_03_28 EDUCON 2022 “Replication of an Evaluation of Teacher Training in t...
2022_03_28 EDUCON 2022 “Replication of an Evaluation of Teacher Training in t...eMadrid network
 
A Path Analysis of Educator Perceptions of OER
A Path Analysis of Educator Perceptions of OERA Path Analysis of Educator Perceptions of OER
A Path Analysis of Educator Perceptions of OERHope Kelly, Ph.D.
 
Predicting instructor performance using data mining techniques in higher educ...
Predicting instructor performance using data mining techniques in higher educ...Predicting instructor performance using data mining techniques in higher educ...
Predicting instructor performance using data mining techniques in higher educ...redpel dot com
 
ICIS Module Spec - BI1H57 Experimental Design and Data Manipulation
ICIS Module Spec - BI1H57 Experimental Design and Data ManipulationICIS Module Spec - BI1H57 Experimental Design and Data Manipulation
ICIS Module Spec - BI1H57 Experimental Design and Data ManipulationDaniel Band
 
Learning Analytics for Learning
Learning Analytics for LearningLearning Analytics for Learning
Learning Analytics for LearningWolfgang Greller
 
Implementation frameworks applied
Implementation frameworks appliedImplementation frameworks applied
Implementation frameworks appliedimpscinetau
 
Group role play as a method of facilitating student
Group role play as a method of facilitating studentGroup role play as a method of facilitating student
Group role play as a method of facilitating studentDaiga Kamerāde
 
Research Trends: Qualitative Analysis in CSCL_Heisawn
Research Trends: Qualitative Analysis in CSCL_HeisawnResearch Trends: Qualitative Analysis in CSCL_Heisawn
Research Trends: Qualitative Analysis in CSCL_HeisawnMerlien Institute
 
Data mining to predict academic performance.
Data mining to predict academic performance. Data mining to predict academic performance.
Data mining to predict academic performance. Ranjith Gowda
 

Tendances (20)

A Model for Predicting Students’ Academic Performance using a Hybrid of K-mea...
A Model for Predicting Students’ Academic Performance using a Hybrid of K-mea...A Model for Predicting Students’ Academic Performance using a Hybrid of K-mea...
A Model for Predicting Students’ Academic Performance using a Hybrid of K-mea...
 
Category & Training Texts Selection for Scientific Article Categorization in ...
Category & Training Texts Selection for Scientific Article Categorization in ...Category & Training Texts Selection for Scientific Article Categorization in ...
Category & Training Texts Selection for Scientific Article Categorization in ...
 
Towards Privacy-Preserving Evaluation for Information Retrieval Models over I...
Towards Privacy-Preserving Evaluation for Information Retrieval Models over I...Towards Privacy-Preserving Evaluation for Information Retrieval Models over I...
Towards Privacy-Preserving Evaluation for Information Retrieval Models over I...
 
Predicting student performance using aggregated data sources
Predicting student performance using aggregated data sourcesPredicting student performance using aggregated data sources
Predicting student performance using aggregated data sources
 
Analysis of virtual labs - Paper presentation at ICALT 2018 (IIT Bombay)
Analysis of virtual labs - Paper presentation at ICALT 2018 (IIT Bombay)Analysis of virtual labs - Paper presentation at ICALT 2018 (IIT Bombay)
Analysis of virtual labs - Paper presentation at ICALT 2018 (IIT Bombay)
 
Student Performance Evaluation in Education Sector Using Prediction and Clust...
Student Performance Evaluation in Education Sector Using Prediction and Clust...Student Performance Evaluation in Education Sector Using Prediction and Clust...
Student Performance Evaluation in Education Sector Using Prediction and Clust...
 
Application of Higher Education System for Predicting Student Using Data mini...
Application of Higher Education System for Predicting Student Using Data mini...Application of Higher Education System for Predicting Student Using Data mini...
Application of Higher Education System for Predicting Student Using Data mini...
 
Documents oerc_160913_va_symp_thorn
 Documents oerc_160913_va_symp_thorn Documents oerc_160913_va_symp_thorn
Documents oerc_160913_va_symp_thorn
 
Strategies for Metabolomics Data Analysis
Strategies for Metabolomics Data AnalysisStrategies for Metabolomics Data Analysis
Strategies for Metabolomics Data Analysis
 
Data Normalization Approaches for Large-scale Biological Studies
Data Normalization Approaches for Large-scale Biological StudiesData Normalization Approaches for Large-scale Biological Studies
Data Normalization Approaches for Large-scale Biological Studies
 
Prote-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationProte-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and Visualization
 
2022_03_28 EDUCON 2022 “Replication of an Evaluation of Teacher Training in t...
2022_03_28 EDUCON 2022 “Replication of an Evaluation of Teacher Training in t...2022_03_28 EDUCON 2022 “Replication of an Evaluation of Teacher Training in t...
2022_03_28 EDUCON 2022 “Replication of an Evaluation of Teacher Training in t...
 
A Path Analysis of Educator Perceptions of OER
A Path Analysis of Educator Perceptions of OERA Path Analysis of Educator Perceptions of OER
A Path Analysis of Educator Perceptions of OER
 
Predicting instructor performance using data mining techniques in higher educ...
Predicting instructor performance using data mining techniques in higher educ...Predicting instructor performance using data mining techniques in higher educ...
Predicting instructor performance using data mining techniques in higher educ...
 
ICIS Module Spec - BI1H57 Experimental Design and Data Manipulation
ICIS Module Spec - BI1H57 Experimental Design and Data ManipulationICIS Module Spec - BI1H57 Experimental Design and Data Manipulation
ICIS Module Spec - BI1H57 Experimental Design and Data Manipulation
 
Learning Analytics for Learning
Learning Analytics for LearningLearning Analytics for Learning
Learning Analytics for Learning
 
Implementation frameworks applied
Implementation frameworks appliedImplementation frameworks applied
Implementation frameworks applied
 
Group role play as a method of facilitating student
Group role play as a method of facilitating studentGroup role play as a method of facilitating student
Group role play as a method of facilitating student
 
Research Trends: Qualitative Analysis in CSCL_Heisawn
Research Trends: Qualitative Analysis in CSCL_HeisawnResearch Trends: Qualitative Analysis in CSCL_Heisawn
Research Trends: Qualitative Analysis in CSCL_Heisawn
 
Data mining to predict academic performance.
Data mining to predict academic performance. Data mining to predict academic performance.
Data mining to predict academic performance.
 

En vedette

12η διάλεξη Γραμμικής Άλγεβρας
12η διάλεξη Γραμμικής Άλγεβρας12η διάλεξη Γραμμικής Άλγεβρας
12η διάλεξη Γραμμικής ΆλγεβραςManolis Vavalis
 
6η διάλεξη Γραμμικής Άλγεβρας
6η διάλεξη Γραμμικής Άλγεβρας6η διάλεξη Γραμμικής Άλγεβρας
6η διάλεξη Γραμμικής ΆλγεβραςManolis Vavalis
 
18η διάλεξη Γραμμικής Άλγεβρας
18η διάλεξη Γραμμικής Άλγεβρας18η διάλεξη Γραμμικής Άλγεβρας
18η διάλεξη Γραμμικής ΆλγεβραςManolis Vavalis
 
19η διάλεξη Γραμμικής Άλγεβρας
19η διάλεξη Γραμμικής Άλγεβρας19η διάλεξη Γραμμικής Άλγεβρας
19η διάλεξη Γραμμικής ΆλγεβραςManolis Vavalis
 
11η διάλεξη Γραμμικής Άλγεβρας
11η διάλεξη Γραμμικής Άλγεβρας11η διάλεξη Γραμμικής Άλγεβρας
11η διάλεξη Γραμμικής ΆλγεβραςManolis Vavalis
 
Archaeology and cultural heritage application working group
Archaeology and cultural heritage application working groupArchaeology and cultural heritage application working group
Archaeology and cultural heritage application working groupManolis Vavalis
 
8η διάλεξη Γραμμικής Άλγεβρας
8η διάλεξη Γραμμικής Άλγεβρας8η διάλεξη Γραμμικής Άλγεβρας
8η διάλεξη Γραμμικής ΆλγεβραςManolis Vavalis
 
4η διάλεξη Γραμμικής Άλγεβρας
4η διάλεξη Γραμμικής Άλγεβρας4η διάλεξη Γραμμικής Άλγεβρας
4η διάλεξη Γραμμικής ΆλγεβραςManolis Vavalis
 
15η διάλεξη Γραμμικής Άλγεβρας
15η διάλεξη Γραμμικής Άλγεβρας15η διάλεξη Γραμμικής Άλγεβρας
15η διάλεξη Γραμμικής ΆλγεβραςManolis Vavalis
 
5η διάλεξη Γραμμικής Άλγβερας
5η διάλεξη Γραμμικής Άλγβερας5η διάλεξη Γραμμικής Άλγβερας
5η διάλεξη Γραμμικής ΆλγβεραςManolis Vavalis
 
Ch. 10 custom tag development
Ch. 10 custom tag developmentCh. 10 custom tag development
Ch. 10 custom tag developmentManolis Vavalis
 
Ομογενή Συστήματα - Ειδικά Συστήματα
Ομογενή Συστήματα - Ειδικά ΣυστήματαΟμογενή Συστήματα - Ειδικά Συστήματα
Ομογενή Συστήματα - Ειδικά ΣυστήματαManolis Vavalis
 
15η διάλεξη - Διανυσματικοί χώροι και υπόχωροι
15η διάλεξη - Διανυσματικοί χώροι και υπόχωροι15η διάλεξη - Διανυσματικοί χώροι και υπόχωροι
15η διάλεξη - Διανυσματικοί χώροι και υπόχωροιManolis Vavalis
 
23η και 24η Δάλεξη - Γραμμικοί Μετασχηματισμοί
23η και 24η Δάλεξη - Γραμμικοί Μετασχηματισμοί23η και 24η Δάλεξη - Γραμμικοί Μετασχηματισμοί
23η και 24η Δάλεξη - Γραμμικοί ΜετασχηματισμοίManolis Vavalis
 
2ο κεφάλαιο μέρος 1ο
2ο κεφάλαιο   μέρος 1ο2ο κεφάλαιο   μέρος 1ο
2ο κεφάλαιο μέρος 1οManolis Vavalis
 

En vedette (19)

12η διάλεξη Γραμμικής Άλγεβρας
12η διάλεξη Γραμμικής Άλγεβρας12η διάλεξη Γραμμικής Άλγεβρας
12η διάλεξη Γραμμικής Άλγεβρας
 
6η διάλεξη Γραμμικής Άλγεβρας
6η διάλεξη Γραμμικής Άλγεβρας6η διάλεξη Γραμμικής Άλγεβρας
6η διάλεξη Γραμμικής Άλγεβρας
 
18η διάλεξη Γραμμικής Άλγεβρας
18η διάλεξη Γραμμικής Άλγεβρας18η διάλεξη Γραμμικής Άλγεβρας
18η διάλεξη Γραμμικής Άλγεβρας
 
19η διάλεξη Γραμμικής Άλγεβρας
19η διάλεξη Γραμμικής Άλγεβρας19η διάλεξη Γραμμικής Άλγεβρας
19η διάλεξη Γραμμικής Άλγεβρας
 
11η διάλεξη Γραμμικής Άλγεβρας
11η διάλεξη Γραμμικής Άλγεβρας11η διάλεξη Γραμμικής Άλγεβρας
11η διάλεξη Γραμμικής Άλγεβρας
 
Archaeology and cultural heritage application working group
Archaeology and cultural heritage application working groupArchaeology and cultural heritage application working group
Archaeology and cultural heritage application working group
 
8η διάλεξη Γραμμικής Άλγεβρας
8η διάλεξη Γραμμικής Άλγεβρας8η διάλεξη Γραμμικής Άλγεβρας
8η διάλεξη Γραμμικής Άλγεβρας
 
4η διάλεξη Γραμμικής Άλγεβρας
4η διάλεξη Γραμμικής Άλγεβρας4η διάλεξη Γραμμικής Άλγεβρας
4η διάλεξη Γραμμικής Άλγεβρας
 
15η διάλεξη Γραμμικής Άλγεβρας
15η διάλεξη Γραμμικής Άλγεβρας15η διάλεξη Γραμμικής Άλγεβρας
15η διάλεξη Γραμμικής Άλγεβρας
 
5η διάλεξη Γραμμικής Άλγβερας
5η διάλεξη Γραμμικής Άλγβερας5η διάλεξη Γραμμικής Άλγβερας
5η διάλεξη Γραμμικής Άλγβερας
 
Ch. 10 custom tag development
Ch. 10 custom tag developmentCh. 10 custom tag development
Ch. 10 custom tag development
 
Ch. 7 beeing a jsp
Ch. 7 beeing a jsp     Ch. 7 beeing a jsp
Ch. 7 beeing a jsp
 
Ch. 8 script free pages
Ch. 8 script free pagesCh. 8 script free pages
Ch. 8 script free pages
 
Ομογενή Συστήματα - Ειδικά Συστήματα
Ομογενή Συστήματα - Ειδικά ΣυστήματαΟμογενή Συστήματα - Ειδικά Συστήματα
Ομογενή Συστήματα - Ειδικά Συστήματα
 
Lecture 3: HTML & CSS
Lecture 3: HTML & CSSLecture 3: HTML & CSS
Lecture 3: HTML & CSS
 
15η διάλεξη - Διανυσματικοί χώροι και υπόχωροι
15η διάλεξη - Διανυσματικοί χώροι και υπόχωροι15η διάλεξη - Διανυσματικοί χώροι και υπόχωροι
15η διάλεξη - Διανυσματικοί χώροι και υπόχωροι
 
23η και 24η Δάλεξη - Γραμμικοί Μετασχηματισμοί
23η και 24η Δάλεξη - Γραμμικοί Μετασχηματισμοί23η και 24η Δάλεξη - Γραμμικοί Μετασχηματισμοί
23η και 24η Δάλεξη - Γραμμικοί Μετασχηματισμοί
 
Quantum Cryptography
Quantum CryptographyQuantum Cryptography
Quantum Cryptography
 
2ο κεφάλαιο μέρος 1ο
2ο κεφάλαιο   μέρος 1ο2ο κεφάλαιο   μέρος 1ο
2ο κεφάλαιο μέρος 1ο
 

Similaire à Revisiting evolutionary information filtering

Using evolutionary testing to improve efficiency and quality
Using evolutionary testing to improve efficiency and qualityUsing evolutionary testing to improve efficiency and quality
Using evolutionary testing to improve efficiency and qualityFaysal Ahmed
 
The Simulacrum, a Synthetic Cancer Dataset
The Simulacrum, a Synthetic Cancer DatasetThe Simulacrum, a Synthetic Cancer Dataset
The Simulacrum, a Synthetic Cancer DatasetCongChen35
 
Modelling and evaluation
Modelling and evaluationModelling and evaluation
Modelling and evaluationeShikshak
 
Qualitative and quantitative analysis
Qualitative and quantitative analysisQualitative and quantitative analysis
Qualitative and quantitative analysisNellie Deutsch (Ed.D)
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceNiko Vuokko
 
Mini datathon - Bengaluru
Mini datathon - BengaluruMini datathon - Bengaluru
Mini datathon - BengaluruKunal Jain
 
Big Data Analytics - It is here and now!
Big Data Analytics - It is here and now!Big Data Analytics - It is here and now!
Big Data Analytics - It is here and now!Farhan Khan
 
Machine Learning Methods 2.pptx
Machine Learning Methods 2.pptxMachine Learning Methods 2.pptx
Machine Learning Methods 2.pptxDOUGLASBILLY
 
Improving evaluations and utilization with statistical edge nested data desi...
Improving evaluations and utilization with statistical edge  nested data desi...Improving evaluations and utilization with statistical edge  nested data desi...
Improving evaluations and utilization with statistical edge nested data desi...CesToronto
 
Introduction to Usability Testing for Survey Research
Introduction to Usability Testing for Survey ResearchIntroduction to Usability Testing for Survey Research
Introduction to Usability Testing for Survey ResearchCaroline Jarrett
 
MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...
MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...
MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...DurgaDevi310087
 
ML.pptvdvdvdvdvdfvdfgvdsdgdsfgdfgdfgdfgdf
ML.pptvdvdvdvdvdfvdfgvdsdgdsfgdfgdfgdfgdfML.pptvdvdvdvdvdfvdfgvdsdgdsfgdfgdfgdfgdf
ML.pptvdvdvdvdvdfvdfgvdsdgdsfgdfgdfgdfgdfAvijitChaudhuri3
 
CATALST intro stats course presentation at JMM 2013 (Elizabeth Fry, Laura Zie...
CATALST intro stats course presentation at JMM 2013 (Elizabeth Fry, Laura Zie...CATALST intro stats course presentation at JMM 2013 (Elizabeth Fry, Laura Zie...
CATALST intro stats course presentation at JMM 2013 (Elizabeth Fry, Laura Zie...statisfactions
 
To bag, or to boost? A question of balance
To bag, or to boost? A question of balanceTo bag, or to boost? A question of balance
To bag, or to boost? A question of balanceAlex Henderson
 

Similaire à Revisiting evolutionary information filtering (20)

Using evolutionary testing to improve efficiency and quality
Using evolutionary testing to improve efficiency and qualityUsing evolutionary testing to improve efficiency and quality
Using evolutionary testing to improve efficiency and quality
 
The Simulacrum, a Synthetic Cancer Dataset
The Simulacrum, a Synthetic Cancer DatasetThe Simulacrum, a Synthetic Cancer Dataset
The Simulacrum, a Synthetic Cancer Dataset
 
Modelling and evaluation
Modelling and evaluationModelling and evaluation
Modelling and evaluation
 
Qualitative and quantitative analysis
Qualitative and quantitative analysisQualitative and quantitative analysis
Qualitative and quantitative analysis
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Ml2 production
Ml2 productionMl2 production
Ml2 production
 
Mini datathon - Bengaluru
Mini datathon - BengaluruMini datathon - Bengaluru
Mini datathon - Bengaluru
 
Big Data Analytics - It is here and now!
Big Data Analytics - It is here and now!Big Data Analytics - It is here and now!
Big Data Analytics - It is here and now!
 
Machine Learning Methods 2.pptx
Machine Learning Methods 2.pptxMachine Learning Methods 2.pptx
Machine Learning Methods 2.pptx
 
Improving evaluations and utilization with statistical edge nested data desi...
Improving evaluations and utilization with statistical edge  nested data desi...Improving evaluations and utilization with statistical edge  nested data desi...
Improving evaluations and utilization with statistical edge nested data desi...
 
Introduction to Usability Testing for Survey Research
Introduction to Usability Testing for Survey ResearchIntroduction to Usability Testing for Survey Research
Introduction to Usability Testing for Survey Research
 
MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...
MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...
MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...
 
ML.ppt
ML.pptML.ppt
ML.ppt
 
ML.ppt
ML.pptML.ppt
ML.ppt
 
ML.ppt
ML.pptML.ppt
ML.ppt
 
ML.ppt
ML.pptML.ppt
ML.ppt
 
ML.pptvdvdvdvdvdfvdfgvdsdgdsfgdfgdfgdfgdf
ML.pptvdvdvdvdvdfvdfgvdsdgdsfgdfgdfgdfgdfML.pptvdvdvdvdvdfvdfgvdsdgdsfgdfgdfgdfgdf
ML.pptvdvdvdvdvdfvdfgvdsdgdsfgdfgdfgdfgdf
 
ML.ppt
ML.pptML.ppt
ML.ppt
 
CATALST intro stats course presentation at JMM 2013 (Elizabeth Fry, Laura Zie...
CATALST intro stats course presentation at JMM 2013 (Elizabeth Fry, Laura Zie...CATALST intro stats course presentation at JMM 2013 (Elizabeth Fry, Laura Zie...
CATALST intro stats course presentation at JMM 2013 (Elizabeth Fry, Laura Zie...
 
To bag, or to boost? A question of balance
To bag, or to boost? A question of balanceTo bag, or to boost? A question of balance
To bag, or to boost? A question of balance
 

Dernier

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 

Dernier (20)

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

Revisiting evolutionary information filtering

  • 1. Revisiting Evolutionary Information Filtering Nikolaos Nanas, Centre for Research and Technology Thessaly, GREECE Stefanos Kodovas, University of Thessaly, GREECE Manolis Vavalis, University of Thessaly, GREECE
  • 2. outline  Adaptive Information Filtering – brief introduction  Evolutionary Information Filtering – review  Diversity & dimensionality – theoretical issues  Experimental evaluation • Methodology – a test-bed • Results – not a success • Discussion – interesting observations  Conclusions and future work
  • 3. Information Overload is still around
  • 4. Adaptive Information Filtering in the case of textual information
  • 5. Adaptive Information Filtering (AIF)  challenging problem with no established solution  complex and dynamic • multiple and changing user interests • changing information environment  crucial issues for successful AIF • profile representationprofile representation • profile adaptationprofile adaptation
  • 6. Evolutionary Information Filtering with the Vector Space Model Profile adaptation through evolution of user’s profiles.
  • 7. Evolutionary Information Filtering • “A Review of Evolutionary and Immune-Inspired Information Filtering”, Natural Computing, 2009 • A common vector space with as many dimensions as the number of unique keywords • A population of profiles that collectively represent the user’s interests • Both profiles and documents are represented as (weighted) vectors in this space • Trigonometric measures of similarity for comparing profile vectors to document vectors • Fitness function based on (explicit or implicit) user feedback • reward profiles that assigned a high relevance score to relevant documents and vice versa • fitness is updated proportional to user feedback • average score of relevant documents • ratio of successful evaluations
  • 8. Evolutionary Information Filtering  profile initialisation is not random  selection • fixed percentage of best individuals • variable percentage • roulette wheel  crossover • single-point, two-point, three-point • variable percentage • roulette wheel  mutation • keyword replacement • random weight modification  steady-space replacement • offspring typically replace less fit individuals
  • 9. Diversity Issues  AIF is not a classic optimisation problem • online learning problem • reminiscent of Multimodal Dynamic Optimisation (MDO)  Traditional GAs suffer in the case of MDO due to diversity loss.  Four types of remedies: 1. adjust mutation rate when changes are observed 2. spread the population 3. memory of previous generations 4. multiple subpopulations • in “Multimodal Dynamic Optimisation: from Evolutionary Algorithms to Artificial Immune Systems”, 2007 • intrinsic diversity problems due to: • selection based on relative fitness • no developmental process • fixed population size
  • 10. Dimensionality Issues • A vector space with a large number of dimensions (keywords) is required for successful AIF • In a multi-dimensional space: • the volume increases exponentially with the number of dimensions • distance based measures become meaningless as points become equidistant • the discriminatory power of pair-wise distances is significantly affected • scalar metrics can not differentiate between vectors with distributed and concentrated differences • in a multi-dimensional keyword space the ability of GAs to achieve profile adaptation is affected because: • the number of possible weighted keyword combinations increases exponentially with the number of dimensions • crossover and mutation cannot randomly produce the right combination of weighted keywords
  • 11. Experimental Evaluation: Dataset  Reuters-21578 • 21578 news stories that appeared in Reuters newswire in 1987 • documents are ordered according to publication date • 135 topic categories • experiments concentrate on the 23 topics with at least 100 relevant documents  document pre-processing • stop word removal • stemming with Porter’s algorithm • weighting with Term Frequency Inverse Document Frequency (TFIDF)  words with large average TFIDF are selected to build the keyword space topic code size earn 3987 acq 2448 money- fx 801 crude 634 grain 628 trade 552 interest 513 wheat 306 ship 305 corn 254 dlr 217 oilseed 192 topic code size money- suply 190 sugar 184 gnp 163 coffee 145 veg-oil 137 gold 135 nat-gas 130 soybean 120 bop 116 livestock 114 cpi 112
  • 13. Baseline Results • as the number of extracted words increases the AUP values increase • for a small number of extracted keywords the results are biased towards topics with a large number of relevant documents • the best results are achieved when all extracted keywords are used • if we wish to represent a range of topics then a multidimensional space is required
  • 14. Experimental Evaluation: Evolutionary Experiments  a vector space comprising 31298 keywords  The basic Genetic Algorithm: • with a population of 100 profiles • each profile is a weighted keyword vector (randomly initialised) • the same random initial population is used in all experiments • documents are evaluated in order using the inner product • new fitness = old fitness + relevance score • the 25% fittest profiles are selected for reproduction • single-point crossover • mutation through random weight modification • the offspring replace the 25% worst profiles  two further variations of the basic GA • GA_init: initialisation using the first 100 relevant documents per topic. • GA_init + learning: a MA that uses Rocchio’s learning algorithm
  • 15. Comparative Results: accuracy • y-axis: best AUP achieved in 50 generations (bias) • baseline results are included • additional results for ranking by date Findings: • the GA performs worse than the baseline • marginal improvements for non-random initialisation • significant improvement when learning is introduced • the MA is only better for some topics with small size
  • 16. Comparative Results: learning • y-axis: average AUP over all topics after each generation • x-axis: number of generations • embedded figure focuses on GA and GA_init Findings: • GA does not essentially improve • better initial performance and learning rate for non random initialisation (GA_init) • much steeper learning curve when learning is introduced (GA_init + learning).
  • 17. Conclusions  The basic GA fails to learn the topic of interest. • the right combination of keyword weights can not be randomly produced. • the GA is lacking a mechanism for appropriately updating keyword weights. • performance depends on the weighted keywords that initialisation produced.  When the GA is initialised based on relevant documents • then the initial set of weighted keywords produces better filtering results  The introduction of learning allows for further improvements in the initial keyword weights. • still worse than the baseline experiment despite the 50 generations • this is possibly due to the negative effect of the genetic operations
  • 18. Discussion  Our experimental results do not agree with the promising results reported in the literature • we did not re-implement an existing approach, but adopted existing techniques • AIF is a complex problem that can not be easily tackled with weighted keyword in a multi-dimensional space • comparative experiments between GAs and other machine learning algorithms have been missing from AIF  large differences observed between the GA and the baseline algorithm • despite the biased comparison in favour of the GA • more fundamental alternatives which are not based on vector representations • the choice of representation should facilitate the learning task • external remedies like those adopted for MDO are not practical  we wish to reanimate the interest of the research community on AIF • biologically inspired solutions are well suited to the problem • appropriate experimental methodologies that reflect the complexity and dynamics of AIF are required

Notes de l'éditeur

  1. Web is nowadays a network of transmitters and receivers of information. Various Web channels email, newsgroups and forums, social networks and technologies as simple as Really Simple Syndication (RSS) contribute to the large amount of information 80s, information overload is still a central issue.
  2. Web is nowadays a network of transmitters and receivers of information. Various Web channels email, newsgroups and forums, social networks and technologies as simple as Really Simple Syndication (RSS) contribute to the large amount of information 80s, information overload is still a central issue.
  3. Why do I like this kind of pictures, music, news??? Research interest has been declined since 2000. For good reasons. Nowadays high demand, in particular from WWW businesses
  4. Early 90s, EC remained outside the main stream