5. Related Works
Algorithm Classification Features Algorithm
Witten et al
(1999)
Statistical TF, TFIDF, length (bi
or tri), first
occurrence, node
degree etc.
Naïve Bayes
P. Turney
(2000)
Statistical phrase frequency,
position, TF, TFIDF,
n-gram Overlap, etc.
C4.5, Genex
Hulth (2003) Linguistic Lexical and syntactic
features
Mihalcea and
Tarau (2004)
Graph Based Unsupervised TextRank
Medelyan et
al (2010)
Graph Based Statistical, lexical ,
syntactic features
MAUI
7. Methodology
• Candidate Selection
– extracts ngrams (range = 1-4) that do not start or
end with a stopword
– Candidate should not be proper nouns
– Candidate should not end with adjective
– Candidates could start with Abbreviation
– Verbs are down-weighted
8. Methodology
Category Description
Statistical TF, TFIDF, Keyphrase Length
Positional First and last point of appearance, geographical spread e.g.,
upper section, mid section and lower section. Also key
candidates’ span
Lexical NP, NE, Ngrams
Semantic Wikipedia Lookup (Freq in Wikipedia), does it have wikipedia
page, in-out link freq on wikipedia page
Semantic LDA Topic count (T=50)
Semantic Candidate similarity to POS-filtered words (Proper Nouns, Verbs
and Adjective)
13. Good or Bad?
Supervised Keyphrase
Extraction, Keyphrase
Extraction system,
supervised machine
learning, Random
Forest algorithm,
Feature Engineering,
Candidate
Word,Keyphrase
Extraction, Behavioural
sciences, supervised
classification,
Keyphrase overlap
14. References
• A. Hulth. Improved automatic keyword extraction given more linguistic knowledge. In Proceedings of the
2003 conference on Empirical methods in natural language processing, pages 216{223. Association for
Computational Linguistics, 2003.
• S. N. Kim, O. Medelyan, M.-Y. Kan, and T. Baldwin. Semeval-2010 task 5: Automatic keyphrase extraction
from scientic articles. In Proceedings of the 5th International Workshop on Semantic Evaluation, pages 21-
26. Association for Computational Linguistics, 2010.
• L. Marujo, A. Gershman, J. Carbonell, R. Frederking, and J. P. Neto. Supervised topical key phrase extraction
of news stories using crowdsourcing, light fltering and co-reference normalization. arXiv preprint
arXiv:1306.4886, 2013.
• P. Turney. Learning to extract keyphrases from text. 1999.
• Xin Jianga, Yunhua Hub, Hang Lib : A Ranking Approach to Keyphrase Extraction, 2010
• T. D. Nguyen and M.-Y. Kan. Keyphrase extraction in scientic publications. In Asian Digital Libraries.
Looking Back 10 Years and Forging New Frontiers, pages 317{326. Springer, 2007.
• I. H. Witten, G. W. Paynter, E. Frank, C. Gutwin, and C. G. Nevill-Manning. Kea: Practical automatic
keyphrase extraction. In Proceedings of the fourth ACM conference on Digital libraries, pages
254{255. ACM, 1999.
• R. Mihalcea and P. Tarau. Textrank: Bringing order into texts. Association for Computational
Linguistics, 2004.