Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Automated Assignment of Hotel Descriptions to Travel Behavioural Patterns Using Text Mining
1. ENTER 2018 Research Track Slide Number 1
Automated Assignment of
Hotel Descriptions to
Travel Behavioural Patterns
Lisa Glatzer, Julia Neidhardt and Hannes Werthner
E-Commerce TU Wien, Austria
lisa.glatzer@ec.tuwien.ac.at
http://www.ec.tuwien.ac.at
2. ENTER 2018 Research Track Slide Number 2
Background
• The Web has dramatically changed the tourism
industry; travellers book the accommodations
for their vacations increasingly online
• Web platforms aim to recommend hotels to
their customers that best fit their preferences
• However, tourism domain is very complex
• Therefore, novel, user-centric recommendation
approaches have been introduced, e.g., seven-
factor model
3. ENTER 2018 Research Track Slide Number 3
Seven-Factor Model
• Personality-based approach: factors combining
Big Five personality traits & 17 tourist roles
• Each factor reflects travel behavioural patterns
Sunlover
Educational
Independent
Cultural
Sportive
Riskseeker
Escapist
[Neidhardt et al., 2014]
4. ENTER 2018 Research Track Slide Number 4
Focus of the Work
• Analysis of hotel descriptions by travel
operators using text mining
• Classification of hotels with different
machine learning approaches
• Assignment of hotels to travel behavioural
patterns (i.e., seven factors)
5. ENTER 2018 Research Track Slide Number 5
Research Questions
(1) How can textual hotel descriptions be
used to identify concepts to enable a
classification of hotel descriptions?
(2) Can the identified concepts be assigned
to different predefined travel
behavioural patterns and, in turn, be
used to deliver recommendations?
6. ENTER 2018 Research Track Slide Number 6
State of the Art – Tourist Roles
• [Cohen, 1972] studied motives for people to
travel & established 4 different tourist roles
• [Gibson & Yiannakis, 2002] identified 17
tourist roles (15 in their previous work) &
studied relation of age, gender, education
and tourist preferences
• [Neidhardt et al., 2014/2015] present
7 different travel behavioural patterns –
the “Seven Factors”
7. ENTER 2018 Research Track Slide Number 7
State of the Art – Text Mining
to Extract Touristic Concepts
• [Lahlou et al., 2013] extract contextual
attributes from hotel reviews on TripAdvisor
for context-aware recommendations
• [Cosh, 2013] extracts key attributes of
destination from Wikipedia articles
• [Schmunk et al., 2014] extract product
properties from online reviews posted on
Booking.com and TripAdvisor
8. ENTER 2018 Research Track Slide Number 8
Methodology (1/5)
• Hotel descriptions provided by GIATA
• Digital information of over 364,000 hotels
by 67 different tourist providers
• Text example:
“<strong>Verpflegungsarten:</strong>All Inclusive Ultra<br/> <br/>
<strong>Lage:</strong> <P>Umgeben von Pinienwäldern, direkt am
kilometerlangen, breiten Sandstrand von Belek gelegen. Den kleinen Ort Kadriye
erreichen Sie nach ca. 2,5km. Nach Belek sind es ca. 17km, nach Antalya mit
zahlreichen Einkaufs- und Unterhaltungsmöglichkeiten etwa 35km…”
Data
Acquisition
Qualitative
Analysis
Data Pre-
Processing
Hotel
Assignment
Evaluation
9. ENTER 2018 Research Track Slide Number 9
Methodology (2/5)
• Manual evaluation of 10 rand. selected hotels
• 20 descriptions per hotel on average
• Text length correlates with information gain
• Different provider offer similar descriptions
• “Templates” – Predefined structure of text
• Observations substantiated by statistical
analyses (lexical diversity - text length)
Data
Acquisition
Qualitative
Analysis
Data Pre-
Processing
Hotel
Assignment
Evaluation
10. ENTER 2018 Research Track Slide Number 10
Methodology (3/5)
• Extraction of html-content & text encoding
• Natural Language Processing
• Tokenizing
• Stopwords Removal
• Stemming
• Pruning
• Word vector generation (TF-IDF)
Data
Acquisition
Qualitative
Analysis
Data Pre-
Processing
Hotel
Assignment
Evaluation
11. ENTER 2018 Research Track Slide Number 11
Methodology (4/5)
• Mapping of hotel descriptions to Seven
Factors using three approaches
1. Clustering
2. Classification
3. Dictionary based approach
Data
Acquisition
Qualitative
Analysis
Data Pre-
Processing
Hotel
Assignment
Evaluation
12. ENTER 2018 Research Track Slide Number 12
Methodology (5/5)
• Training, validation and evaluation with
labelled data set established by Austrian
travel operator
• Training & validation set: 371 hotels
• Test set: 180 hotels
Data
Acquisition
Qualitative
Analysis
Data Pre-
Processing
Hotel
Assignment
Evaluation
13. ENTER 2018 Research Track Slide Number 13
1. Clustering (1/3)
• Unsupervised learning, fully automated
• Clustering method: K-Means
• Similarity measure: Cosine similarity
• Data: Training set with 371 hotels
• Number of cluster: 6 (based on various
clustering evaluation coefficients)
Goal: Generation of disjoint clusters of hotel
descriptions which reflect the Seven Factors
14. ENTER 2018 Research Track Slide Number 14
1. Clustering (2/3)
Distribution of Seven Factors
Cluster Hotels
0 135
1 43
2 38
3 48
4 59
5 48
15. ENTER 2018 Research Track Slide Number 15
1. Clustering (3/3)
Distribution of travel operator
Cluster Hotels
0 135
1 43
2 38
3 48
4 59
5 48
16. ENTER 2018 Research Track Slide Number 16
2. Classification
• Supervised learning
• Classifier: Naive Bayes, KNN, Decision Tree
• Validation: 10-fold cross validation
• Data: Training set with 371 hotels
Goal: Generation of seven models which can
be allocated to the Seven Factors
17. ENTER 2018 Research Track Slide Number 17
3. Dictionary
• Identification of most important words by
experts for all Seven Factors
Goal: Classification with attributes of dictionaries
Sunlover Strand, Swimmingpool, Wlan, Liege, Meer, Beach, inclusive, Internetzugang,
Liegestuhl, Meer, Meerblick, Pool, Sonnenschirm, Sonnenterasse, Wifi
Educational Buffet, Club, Halbpension, inclusive, Miniclub, Musik
Independent Eigenregie, gemütlich, individuell, lokal, Zentrum
Cultural Spa, Suite, superior, elegant, Carte, Bademantel, Mietsafe, modern,
Wellnessbereich, Whirlpool
Sportive Aktivität, Fitness, Fitnessraum, Sport, Tennis, Tischtennis, Tennisplatz, Golfplatz
Riskseeker Club, Stadt, Unterhaltung
Excapist Ruhig, gemütlich, Wellness, Park, Garten, Spa, Wellnessbereich
18. ENTER 2018 Research Track Slide Number 18
Classifcation vs. Dictionary
Validation of training set with 371 hotels
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
Sunlover Educational Independent Cultural Sportive Riskseeker Escapist
Precision Dictionary Precision Classification
19. ENTER 2018 Research Track Slide Number 19
Final Evaluation
Evaluation of best approaches with independent
test set
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
Sunlover Educational Independent Cultural Sportive Riskseeker Escapist
Precision Test Set (180 Hotels) Precision Training Set (371 Hotels)
Dict.: Sunlover
Class.: Other factors
20. ENTER 2018 Research Track Slide Number 20
Conclusion
• Allocation of hotels to tourist profiles using
textual data can be successfully implemented,
dependent on targeted user group
+ Sunlover, Escapist, Cultural, Sportive
- Educational, Independent, Riskseeker
• Majority of designed models are capable of
dealing with new hotel data
• Recommendations based on hotel descriptions
can be reasonable for recommender systems
22. ENTER 2018 Research Track Slide Number 22
References (1/3)
• Aggarwal, C. C. & Zhai, C. (2012). A Survey of Text Clustering Algorithms. In Mining Text Data, pages
77–128. Springer-Verlag New York.
• Bird, S., Klein, E. & Loper, E. (2009). Natural Language Processing with Python. O’Reilly Media, Inc.
• Burke, R. & Ramezani, M. (2011). Matching Recommendation Technologies and Domains. In
Recommender Systems Handbook, pages 367–386. Springer US.
• Cohen, E. (1972). Toward a sociology of international tourism. Social Research, 39(1): 164–182.
• Cosh, K. (2013). Text mining Wikipedia to discover alternative destinations. The 2013 10th
International Joint Conference on Computer Science and Software Engineering (JCSSE), pages 43–48.
• Gibson, H. & Yiannakis, A. (2002). Tourist roles: Needs and the lifecourse. Annals of Tourism
Research, 29(2): 358–383.
• Goldberg, L. R. (1999). The Structure of Phenotypic Personality Traits. American Psychologist, 48(1):
26–34.
• Gupta, G. K. (2006). Introduction to Data Mining with Case Studies. Prentice-Hall of India Pvt.Ltd.
• Hippner, H. & Rentzmann, R. (2006). Text mining. Informatik-Spektrum, 29(4): 287–290.
• Johansson, V. (2008). Lexical diversity and lexical density in speech and writing: a developmental
perspective. Working Papers of Department of Linguistics and Phonetics, Lund University, 53:61–79.
23. ENTER 2018 Research Track Slide Number 23
References (2/3)
• Lahlou, F. Z., Mountassir, A., Benbrahim, H., & Kassou, I. (2013). A Text Classification Based Method
for Context Extraction from Online Reviews. 8th International Conference on Intelligent Systems:
Theories and Applications (SITA).
• Neidhardt, J., Schuster, R., Seyfang, L. & Werthner, H. (2014). Eliciting the users’ unknown
preferences. In Proceedings of the 8th ACM Conference on Recommender systems, 309-312. ACM.
• Neidhardt, J., Seyfang, L., Schuster, R. & Werthner, H. (2015). A picture-based approach to
recommender systems. Information Technology & Tourism, 15(1): 49-69.
• Neidhardt, J., & Werthner, H. (2017). Travellers and Their Joint Characteristics Within the Seven-
Factor Model. In Information and Communication Technologies in Tourism 2017 (pp. 503-515).
Springer.
• Ricci, F., Rokach, L. & Shapira, B. (2011). Introduction to Recommender Systems Handbook. In
Recommender Systems Handbook, pages 1-35. Springer US.
• Schmunk, S., Höpken, W., Fuchs, M. & Lexhagen, M. (2014). Sentiment Analysis: Extracting Decision-
Relevant Knowledge from UGC. Information and Communication Technologies in Tourism, 14(1):
253-265.
• Sharma, Y., Bhatt, T. & Magon, R. (2015). A multi criteria review-based hotel recommendation
system. 15th IEEE International Conference on Computer and Information Technology, pages 687–
691.
24. ENTER 2018 Research Track Slide Number 24
References (3/3)
• Weiss, S. M., Indurkhya, N. & Zhang, T. (2010). Fundamentals of Predictive Text Mining. Springer-
Verlag London.
• Werthner, H. & Klein, S. (1999). Information Technology and Tourism – A Challenging Relationship.
Wien - New York: Springer-Verlag.
• Werthner, H., Alzua-Sorzabal, A., Cantoni, L., Dickinger, A., Gretzel, U., Jannach, D., Neidhardt, J.,
Pröll, B., Ricci, F., Scaglione, M., Stangl, B., Stock, O. & Zanker, M. (2015). Information Technology &
Tourism, 15(1).
• Yiannakis, A. & Gibson, H. (1992). Roles tourists play. Annals of Tourism Research, 19(2): 287–303.
• Xiang, Z., Du, Q., Ma, Y. & Fan, W. (2017). Assessing Reliability of Social Media Data: Lessons from
Mining TripAdvisor Hotel Reviews. Information and Communication Technologies in Tourism, 17(1):
625–637.