Kolawole John Adebayo, Luigi Di Caro and Guido Boella | A Supervised Keyphrase Extraction System

A SUPERVISED KEYPHRASE
EXTRACTION SYSTEM
Semantics 2016, Leipzig
Kolawole. J, Adebayo
Luigi, Di Caro
Guido, Boella

Outlines
• Introduction
• Related Works
• Methodology
• Experiments
• Conclusions

Introduction
• Keyphrases
– What?
– Why?
• Document Indexing , Document Summarization , Clustering
and visualization
• Keyphrase Assignment Vs Keyphrase Extraction
• Unsupervised Vs Supervised
• Classification Vs Ranking

Introduction
Semantic
Features
Supervised
KeyPhrase
Extraction,
Keyphrase,
Keyword

Related Works
Algorithm Classification Features Algorithm
Witten et al
(1999)
Statistical TF, TFIDF, length (bi
or tri), first
occurrence, node
degree etc.
Naïve Bayes
P. Turney
(2000)
Statistical phrase frequency,
position, TF, TFIDF,
n-gram Overlap, etc.
C4.5, Genex
Hulth (2003) Linguistic Lexical and syntactic
features
Mihalcea and
Tarau (2004)
Graph Based Unsupervised TextRank
Medelyan et
al (2010)
Graph Based Statistical, lexical ,
syntactic features
MAUI

Methodology
Training
Document
Select
Candidate
Extract
Feature for
Candidates
Combine
features with
Classifier
Training
Document
Select
Candidate
Extract
Feature for
Candidates
Predictor
Extracted
Keyphrases

Methodology
• Candidate Selection
– extracts ngrams (range = 1-4) that do not start or
end with a stopword
– Candidate should not be proper nouns
– Candidate should not end with adjective
– Candidates could start with Abbreviation
– Verbs are down-weighted

Methodology
Category Description
Statistical TF, TFIDF, Keyphrase Length
Positional First and last point of appearance, geographical spread e.g.,
upper section, mid section and lower section. Also key
candidates’ span
Lexical NP, NE, Ngrams
Semantic Wikipedia Lookup (Freq in Wikipedia), does it have wikipedia
page, in-out link freq on wikipedia page
Semantic LDA Topic count (T=50)
Semantic Candidate similarity to POS-filtered words (Proper Nouns, Verbs
and Adjective)

Methodology
POS filtered n-
grams (2,3,4)
Candidate
keyphrase
Embedding
Similarity

Results
Features Dataset Precision Recall F-Measure
Meldeyan et al
(2010)
Marujo 49.4 - -
Marujo et al
(2013)
Marujo 55.4 - -
All-features Marujo 58.3 42.0 48.8
Selected-
features
Marujo 48.7 36.5 41.7
Table 1: Evaluation result on Marujo dataset

Results
Selected
Features
Combined 29.9 20.3 16.9
Selected
Features
Reader 26.4 17.1 20.7
All-features Combined 32.7 21.0 25.5
All-features Reader 30.2 18.1 22.6
Table 2: Evaluation result on Semeval dataset

Results
(2,5,6,7,8,9) Combined 32.1 20.6 25.0
(1,2,5,7,8,9) Combined 31.8 20.1 24.7
(2,4,5,7,8,9) Combined 30.2 17.7 22.3
(3,4,6,7,8,9) Combined 27.4 16.3 20.4
Table 3: Ablation test on Semeval dataset

Good or Bad?
Supervised Keyphrase
Extraction, Keyphrase
Extraction system,
supervised machine
learning, Random
Forest algorithm,
Feature Engineering,
Candidate
Word,Keyphrase
Extraction, Behavioural
sciences, supervised
classification,
Keyphrase overlap

References
• A. Hulth. Improved automatic keyword extraction given more linguistic knowledge. In Proceedings of the
2003 conference on Empirical methods in natural language processing, pages 216{223. Association for
Computational Linguistics, 2003.
• S. N. Kim, O. Medelyan, M.-Y. Kan, and T. Baldwin. Semeval-2010 task 5: Automatic keyphrase extraction
from scientic articles. In Proceedings of the 5th International Workshop on Semantic Evaluation, pages 21-
26. Association for Computational Linguistics, 2010.
• L. Marujo, A. Gershman, J. Carbonell, R. Frederking, and J. P. Neto. Supervised topical key phrase extraction
of news stories using crowdsourcing, light fltering and co-reference normalization. arXiv preprint
arXiv:1306.4886, 2013.
• P. Turney. Learning to extract keyphrases from text. 1999.
• Xin Jianga, Yunhua Hub, Hang Lib : A Ranking Approach to Keyphrase Extraction, 2010
• T. D. Nguyen and M.-Y. Kan. Keyphrase extraction in scientic publications. In Asian Digital Libraries.
Looking Back 10 Years and Forging New Frontiers, pages 317{326. Springer, 2007.
• I. H. Witten, G. W. Paynter, E. Frank, C. Gutwin, and C. G. Nevill-Manning. Kea: Practical automatic
keyphrase extraction. In Proceedings of the fourth ACM conference on Digital libraries, pages
254{255. ACM, 1999.
• R. Mihalcea and P. Tarau. Textrank: Bringing order into texts. Association for Computational
Linguistics, 2004.

Conclusions
• Many Thanks For The Attention!!!

Kolawole John Adebayo, Luigi Di Caro and Guido Boella | A Supervised Keyphrase Extraction System

Recommended

Recommended

More Related Content

What's hot

What's hot (15)

Viewers also liked

Viewers also liked (20)

Similar to Kolawole John Adebayo, Luigi Di Caro and Guido Boella | A Supervised Keyphrase Extraction System

Similar to Kolawole John Adebayo, Luigi Di Caro and Guido Boella | A Supervised Keyphrase Extraction System (20)

More from semanticsconference

More from semanticsconference (20)

Recently uploaded

Recently uploaded (20)

Kolawole John Adebayo, Luigi Di Caro and Guido Boella | A Supervised Keyphrase Extraction System