SlideShare une entreprise Scribd logo
1  sur  24
ENTER 2018 Research Track Slide Number 1
Automated Assignment of
Hotel Descriptions to
Travel Behavioural Patterns
Lisa Glatzer, Julia Neidhardt and Hannes Werthner
E-Commerce TU Wien, Austria
lisa.glatzer@ec.tuwien.ac.at
http://www.ec.tuwien.ac.at
ENTER 2018 Research Track Slide Number 2
Background
• The Web has dramatically changed the tourism
industry; travellers book the accommodations
for their vacations increasingly online
• Web platforms aim to recommend hotels to
their customers that best fit their preferences
• However, tourism domain is very complex
• Therefore, novel, user-centric recommendation
approaches have been introduced, e.g., seven-
factor model
ENTER 2018 Research Track Slide Number 3
Seven-Factor Model
• Personality-based approach: factors combining
Big Five personality traits & 17 tourist roles
• Each factor reflects travel behavioural patterns
Sunlover
Educational
Independent
Cultural
Sportive
Riskseeker
Escapist
[Neidhardt et al., 2014]
ENTER 2018 Research Track Slide Number 4
Focus of the Work
• Analysis of hotel descriptions by travel
operators using text mining
• Classification of hotels with different
machine learning approaches
• Assignment of hotels to travel behavioural
patterns (i.e., seven factors)
ENTER 2018 Research Track Slide Number 5
Research Questions
(1) How can textual hotel descriptions be
used to identify concepts to enable a
classification of hotel descriptions?
(2) Can the identified concepts be assigned
to different predefined travel
behavioural patterns and, in turn, be
used to deliver recommendations?
ENTER 2018 Research Track Slide Number 6
State of the Art – Tourist Roles
• [Cohen, 1972] studied motives for people to
travel & established 4 different tourist roles
• [Gibson & Yiannakis, 2002] identified 17
tourist roles (15 in their previous work) &
studied relation of age, gender, education
and tourist preferences
• [Neidhardt et al., 2014/2015] present
7 different travel behavioural patterns –
the “Seven Factors”
ENTER 2018 Research Track Slide Number 7
State of the Art – Text Mining
to Extract Touristic Concepts
• [Lahlou et al., 2013] extract contextual
attributes from hotel reviews on TripAdvisor
for context-aware recommendations
• [Cosh, 2013] extracts key attributes of
destination from Wikipedia articles
• [Schmunk et al., 2014] extract product
properties from online reviews posted on
Booking.com and TripAdvisor
ENTER 2018 Research Track Slide Number 8
Methodology (1/5)
• Hotel descriptions provided by GIATA
• Digital information of over 364,000 hotels
by 67 different tourist providers
• Text example:
“<strong>Verpflegungsarten:</strong>All Inclusive Ultra<br/> <br/>
<strong>Lage:</strong> <P>Umgeben von Pinienw&#228;ldern, direkt am
kilometerlangen, breiten Sandstrand von Belek gelegen. Den kleinen Ort Kadriye
erreichen Sie nach ca. 2,5km. Nach Belek sind es ca. 17km, nach Antalya mit
zahlreichen Einkaufs- und Unterhaltungsm&#246glichkeiten etwa 35km…”
Data
Acquisition
Qualitative
Analysis
Data Pre-
Processing
Hotel
Assignment
Evaluation
ENTER 2018 Research Track Slide Number 9
Methodology (2/5)
• Manual evaluation of 10 rand. selected hotels
• 20 descriptions per hotel on average
• Text length correlates with information gain
• Different provider offer similar descriptions
• “Templates” – Predefined structure of text
• Observations substantiated by statistical
analyses (lexical diversity - text length)
Data
Acquisition
Qualitative
Analysis
Data Pre-
Processing
Hotel
Assignment
Evaluation
ENTER 2018 Research Track Slide Number 10
Methodology (3/5)
• Extraction of html-content & text encoding
• Natural Language Processing
• Tokenizing
• Stopwords Removal
• Stemming
• Pruning
• Word vector generation (TF-IDF)
Data
Acquisition
Qualitative
Analysis
Data Pre-
Processing
Hotel
Assignment
Evaluation
ENTER 2018 Research Track Slide Number 11
Methodology (4/5)
• Mapping of hotel descriptions to Seven
Factors using three approaches
1. Clustering
2. Classification
3. Dictionary based approach
Data
Acquisition
Qualitative
Analysis
Data Pre-
Processing
Hotel
Assignment
Evaluation
ENTER 2018 Research Track Slide Number 12
Methodology (5/5)
• Training, validation and evaluation with
labelled data set established by Austrian
travel operator
• Training & validation set: 371 hotels
• Test set: 180 hotels
Data
Acquisition
Qualitative
Analysis
Data Pre-
Processing
Hotel
Assignment
Evaluation
ENTER 2018 Research Track Slide Number 13
1. Clustering (1/3)
• Unsupervised learning, fully automated
• Clustering method: K-Means
• Similarity measure: Cosine similarity
• Data: Training set with 371 hotels
• Number of cluster: 6 (based on various
clustering evaluation coefficients)
Goal: Generation of disjoint clusters of hotel
descriptions which reflect the Seven Factors
ENTER 2018 Research Track Slide Number 14
1. Clustering (2/3)
Distribution of Seven Factors
Cluster Hotels
0 135
1 43
2 38
3 48
4 59
5 48
ENTER 2018 Research Track Slide Number 15
1. Clustering (3/3)
Distribution of travel operator
Cluster Hotels
0 135
1 43
2 38
3 48
4 59
5 48
ENTER 2018 Research Track Slide Number 16
2. Classification
• Supervised learning
• Classifier: Naive Bayes, KNN, Decision Tree
• Validation: 10-fold cross validation
• Data: Training set with 371 hotels
Goal: Generation of seven models which can
be allocated to the Seven Factors
ENTER 2018 Research Track Slide Number 17
3. Dictionary
• Identification of most important words by
experts for all Seven Factors
Goal: Classification with attributes of dictionaries
Sunlover Strand, Swimmingpool, Wlan, Liege, Meer, Beach, inclusive, Internetzugang,
Liegestuhl, Meer, Meerblick, Pool, Sonnenschirm, Sonnenterasse, Wifi
Educational Buffet, Club, Halbpension, inclusive, Miniclub, Musik
Independent Eigenregie, gemütlich, individuell, lokal, Zentrum
Cultural Spa, Suite, superior, elegant, Carte, Bademantel, Mietsafe, modern,
Wellnessbereich, Whirlpool
Sportive Aktivität, Fitness, Fitnessraum, Sport, Tennis, Tischtennis, Tennisplatz, Golfplatz
Riskseeker Club, Stadt, Unterhaltung
Excapist Ruhig, gemütlich, Wellness, Park, Garten, Spa, Wellnessbereich
ENTER 2018 Research Track Slide Number 18
Classifcation vs. Dictionary
Validation of training set with 371 hotels
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
Sunlover Educational Independent Cultural Sportive Riskseeker Escapist
Precision Dictionary Precision Classification
ENTER 2018 Research Track Slide Number 19
Final Evaluation
Evaluation of best approaches with independent
test set
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
Sunlover Educational Independent Cultural Sportive Riskseeker Escapist
Precision Test Set (180 Hotels) Precision Training Set (371 Hotels)
Dict.: Sunlover
Class.: Other factors
ENTER 2018 Research Track Slide Number 20
Conclusion
• Allocation of hotels to tourist profiles using
textual data can be successfully implemented,
dependent on targeted user group
+ Sunlover, Escapist, Cultural, Sportive
- Educational, Independent, Riskseeker
• Majority of designed models are capable of
dealing with new hotel data
• Recommendations based on hotel descriptions
can be reasonable for recommender systems
ENTER 2018 Research Track Slide Number 21
Thanks for your attention!
ENTER 2018 Research Track Slide Number 22
References (1/3)
• Aggarwal, C. C. & Zhai, C. (2012). A Survey of Text Clustering Algorithms. In Mining Text Data, pages
77–128. Springer-Verlag New York.
• Bird, S., Klein, E. & Loper, E. (2009). Natural Language Processing with Python. O’Reilly Media, Inc.
• Burke, R. & Ramezani, M. (2011). Matching Recommendation Technologies and Domains. In
Recommender Systems Handbook, pages 367–386. Springer US.
• Cohen, E. (1972). Toward a sociology of international tourism. Social Research, 39(1): 164–182.
• Cosh, K. (2013). Text mining Wikipedia to discover alternative destinations. The 2013 10th
International Joint Conference on Computer Science and Software Engineering (JCSSE), pages 43–48.
• Gibson, H. & Yiannakis, A. (2002). Tourist roles: Needs and the lifecourse. Annals of Tourism
Research, 29(2): 358–383.
• Goldberg, L. R. (1999). The Structure of Phenotypic Personality Traits. American Psychologist, 48(1):
26–34.
• Gupta, G. K. (2006). Introduction to Data Mining with Case Studies. Prentice-Hall of India Pvt.Ltd.
• Hippner, H. & Rentzmann, R. (2006). Text mining. Informatik-Spektrum, 29(4): 287–290.
• Johansson, V. (2008). Lexical diversity and lexical density in speech and writing: a developmental
perspective. Working Papers of Department of Linguistics and Phonetics, Lund University, 53:61–79.
ENTER 2018 Research Track Slide Number 23
References (2/3)
• Lahlou, F. Z., Mountassir, A., Benbrahim, H., & Kassou, I. (2013). A Text Classification Based Method
for Context Extraction from Online Reviews. 8th International Conference on Intelligent Systems:
Theories and Applications (SITA).
• Neidhardt, J., Schuster, R., Seyfang, L. & Werthner, H. (2014). Eliciting the users’ unknown
preferences. In Proceedings of the 8th ACM Conference on Recommender systems, 309-312. ACM.
• Neidhardt, J., Seyfang, L., Schuster, R. & Werthner, H. (2015). A picture-based approach to
recommender systems. Information Technology & Tourism, 15(1): 49-69.
• Neidhardt, J., & Werthner, H. (2017). Travellers and Their Joint Characteristics Within the Seven-
Factor Model. In Information and Communication Technologies in Tourism 2017 (pp. 503-515).
Springer.
• Ricci, F., Rokach, L. & Shapira, B. (2011). Introduction to Recommender Systems Handbook. In
Recommender Systems Handbook, pages 1-35. Springer US.
• Schmunk, S., Höpken, W., Fuchs, M. & Lexhagen, M. (2014). Sentiment Analysis: Extracting Decision-
Relevant Knowledge from UGC. Information and Communication Technologies in Tourism, 14(1):
253-265.
• Sharma, Y., Bhatt, T. & Magon, R. (2015). A multi criteria review-based hotel recommendation
system. 15th IEEE International Conference on Computer and Information Technology, pages 687–
691.
ENTER 2018 Research Track Slide Number 24
References (3/3)
• Weiss, S. M., Indurkhya, N. & Zhang, T. (2010). Fundamentals of Predictive Text Mining. Springer-
Verlag London.
• Werthner, H. & Klein, S. (1999). Information Technology and Tourism – A Challenging Relationship.
Wien - New York: Springer-Verlag.
• Werthner, H., Alzua-Sorzabal, A., Cantoni, L., Dickinger, A., Gretzel, U., Jannach, D., Neidhardt, J.,
Pröll, B., Ricci, F., Scaglione, M., Stangl, B., Stock, O. & Zanker, M. (2015). Information Technology &
Tourism, 15(1).
• Yiannakis, A. & Gibson, H. (1992). Roles tourists play. Annals of Tourism Research, 19(2): 287–303.
• Xiang, Z., Du, Q., Ma, Y. & Fan, W. (2017). Assessing Reliability of Social Media Data: Lessons from
Mining TripAdvisor Hotel Reviews. Information and Communication Technologies in Tourism, 17(1):
625–637.

Contenu connexe

Similaire à Automated Assignment of Hotel Descriptions to Travel Behavioural Patterns Using Text Mining

On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)Galit Shmueli
 
Efficiency of hotels web sites
Efficiency of hotels web sitesEfficiency of hotels web sites
Efficiency of hotels web sitesestrella_diaz
 
How to Scrape Hotel Reviews Data for Complete Hotel Review Analytics.pdf
How to Scrape Hotel Reviews Data for Complete Hotel Review Analytics.pdfHow to Scrape Hotel Reviews Data for Complete Hotel Review Analytics.pdf
How to Scrape Hotel Reviews Data for Complete Hotel Review Analytics.pdffarhanaaansari42
 
AI techniques for tourism-oriented applications
AI techniques for tourism-oriented applicationsAI techniques for tourism-oriented applications
AI techniques for tourism-oriented applicationsAmine Bendahmane
 
STOCKSENTIX: A MACHINE LEARNING APPROACH TO STOCKMARKET
STOCKSENTIX: A MACHINE LEARNING APPROACH TO STOCKMARKETSTOCKSENTIX: A MACHINE LEARNING APPROACH TO STOCKMARKET
STOCKSENTIX: A MACHINE LEARNING APPROACH TO STOCKMARKETIRJET Journal
 
How to Scrape Hotel Reviews Data for Complete Hotel Review Analytics.pptx
How to Scrape Hotel Reviews Data for Complete Hotel Review Analytics.pptxHow to Scrape Hotel Reviews Data for Complete Hotel Review Analytics.pptx
How to Scrape Hotel Reviews Data for Complete Hotel Review Analytics.pptxfarhanaaansari42
 
Introduction to Computational Statistics
Introduction to Computational StatisticsIntroduction to Computational Statistics
Introduction to Computational StatisticsSetia Pramana
 
Location Embeddings for Next Trip Recommendation
Location Embeddings for Next Trip RecommendationLocation Embeddings for Next Trip Recommendation
Location Embeddings for Next Trip RecommendationRaphael Troncy
 
An analysis study of tourist perception and satisfaction towards hotels in agra
An analysis study of tourist perception and satisfaction towards hotels in agraAn analysis study of tourist perception and satisfaction towards hotels in agra
An analysis study of tourist perception and satisfaction towards hotels in agrajs slides
 

Similaire à Automated Assignment of Hotel Descriptions to Travel Behavioural Patterns Using Text Mining (20)

On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
 
Smart hotels of today and tomorrow
Smart hotels of today and tomorrowSmart hotels of today and tomorrow
Smart hotels of today and tomorrow
 
Efficiency of hotels web sites
Efficiency of hotels web sitesEfficiency of hotels web sites
Efficiency of hotels web sites
 
How to Scrape Hotel Reviews Data for Complete Hotel Review Analytics.pdf
How to Scrape Hotel Reviews Data for Complete Hotel Review Analytics.pdfHow to Scrape Hotel Reviews Data for Complete Hotel Review Analytics.pdf
How to Scrape Hotel Reviews Data for Complete Hotel Review Analytics.pdf
 
How effective are Asian hotels in communicating CSR efforts through the prope...
How effective are Asian hotels in communicating CSR efforts through the prope...How effective are Asian hotels in communicating CSR efforts through the prope...
How effective are Asian hotels in communicating CSR efforts through the prope...
 
Information gathering by ubiquitous services for CRM in tourism destinations:...
Information gathering by ubiquitous services for CRM in tourism destinations:...Information gathering by ubiquitous services for CRM in tourism destinations:...
Information gathering by ubiquitous services for CRM in tourism destinations:...
 
TR6124 Assignment 3.pptx
TR6124 Assignment 3.pptxTR6124 Assignment 3.pptx
TR6124 Assignment 3.pptx
 
Constructing a Data Warehouse Based Decision Support Platform for China Touri...
Constructing a Data Warehouse Based Decision Support Platform for China Touri...Constructing a Data Warehouse Based Decision Support Platform for China Touri...
Constructing a Data Warehouse Based Decision Support Platform for China Touri...
 
AI techniques for tourism-oriented applications
AI techniques for tourism-oriented applicationsAI techniques for tourism-oriented applications
AI techniques for tourism-oriented applications
 
What does hotel location mean for the online consumer? Text analytics using o...
What does hotel location mean for the online consumer? Text analytics using o...What does hotel location mean for the online consumer? Text analytics using o...
What does hotel location mean for the online consumer? Text analytics using o...
 
STOCKSENTIX: A MACHINE LEARNING APPROACH TO STOCKMARKET
STOCKSENTIX: A MACHINE LEARNING APPROACH TO STOCKMARKETSTOCKSENTIX: A MACHINE LEARNING APPROACH TO STOCKMARKET
STOCKSENTIX: A MACHINE LEARNING APPROACH TO STOCKMARKET
 
How to Scrape Hotel Reviews Data for Complete Hotel Review Analytics.pptx
How to Scrape Hotel Reviews Data for Complete Hotel Review Analytics.pptxHow to Scrape Hotel Reviews Data for Complete Hotel Review Analytics.pptx
How to Scrape Hotel Reviews Data for Complete Hotel Review Analytics.pptx
 
Time-varying browsing behavior of hotel website users (Research Note)
Time-varying browsing behavior of hotel website users (Research Note)Time-varying browsing behavior of hotel website users (Research Note)
Time-varying browsing behavior of hotel website users (Research Note)
 
Automatic Hotel Photo Quality Assessment Based on Visual Features
Automatic Hotel Photo Quality Assessment Based on Visual FeaturesAutomatic Hotel Photo Quality Assessment Based on Visual Features
Automatic Hotel Photo Quality Assessment Based on Visual Features
 
Content Analysis of Travel Reviews: Exploring the Needs of Tourists from Diff...
Content Analysis of Travel Reviews: Exploring the Needs of Tourists from Diff...Content Analysis of Travel Reviews: Exploring the Needs of Tourists from Diff...
Content Analysis of Travel Reviews: Exploring the Needs of Tourists from Diff...
 
Introduction to Computational Statistics
Introduction to Computational StatisticsIntroduction to Computational Statistics
Introduction to Computational Statistics
 
Location Embeddings for Next Trip Recommendation
Location Embeddings for Next Trip RecommendationLocation Embeddings for Next Trip Recommendation
Location Embeddings for Next Trip Recommendation
 
Prioritisation of Key Performance Indicators in an Evaluation Framework for D...
Prioritisation of Key Performance Indicators in an Evaluation Framework for D...Prioritisation of Key Performance Indicators in an Evaluation Framework for D...
Prioritisation of Key Performance Indicators in an Evaluation Framework for D...
 
An analysis study of tourist perception and satisfaction towards hotels in agra
An analysis study of tourist perception and satisfaction towards hotels in agraAn analysis study of tourist perception and satisfaction towards hotels in agra
An analysis study of tourist perception and satisfaction towards hotels in agra
 
saurabh10
saurabh10saurabh10
saurabh10
 

Dernier

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 

Dernier (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 

Automated Assignment of Hotel Descriptions to Travel Behavioural Patterns Using Text Mining

  • 1. ENTER 2018 Research Track Slide Number 1 Automated Assignment of Hotel Descriptions to Travel Behavioural Patterns Lisa Glatzer, Julia Neidhardt and Hannes Werthner E-Commerce TU Wien, Austria lisa.glatzer@ec.tuwien.ac.at http://www.ec.tuwien.ac.at
  • 2. ENTER 2018 Research Track Slide Number 2 Background • The Web has dramatically changed the tourism industry; travellers book the accommodations for their vacations increasingly online • Web platforms aim to recommend hotels to their customers that best fit their preferences • However, tourism domain is very complex • Therefore, novel, user-centric recommendation approaches have been introduced, e.g., seven- factor model
  • 3. ENTER 2018 Research Track Slide Number 3 Seven-Factor Model • Personality-based approach: factors combining Big Five personality traits & 17 tourist roles • Each factor reflects travel behavioural patterns Sunlover Educational Independent Cultural Sportive Riskseeker Escapist [Neidhardt et al., 2014]
  • 4. ENTER 2018 Research Track Slide Number 4 Focus of the Work • Analysis of hotel descriptions by travel operators using text mining • Classification of hotels with different machine learning approaches • Assignment of hotels to travel behavioural patterns (i.e., seven factors)
  • 5. ENTER 2018 Research Track Slide Number 5 Research Questions (1) How can textual hotel descriptions be used to identify concepts to enable a classification of hotel descriptions? (2) Can the identified concepts be assigned to different predefined travel behavioural patterns and, in turn, be used to deliver recommendations?
  • 6. ENTER 2018 Research Track Slide Number 6 State of the Art – Tourist Roles • [Cohen, 1972] studied motives for people to travel & established 4 different tourist roles • [Gibson & Yiannakis, 2002] identified 17 tourist roles (15 in their previous work) & studied relation of age, gender, education and tourist preferences • [Neidhardt et al., 2014/2015] present 7 different travel behavioural patterns – the “Seven Factors”
  • 7. ENTER 2018 Research Track Slide Number 7 State of the Art – Text Mining to Extract Touristic Concepts • [Lahlou et al., 2013] extract contextual attributes from hotel reviews on TripAdvisor for context-aware recommendations • [Cosh, 2013] extracts key attributes of destination from Wikipedia articles • [Schmunk et al., 2014] extract product properties from online reviews posted on Booking.com and TripAdvisor
  • 8. ENTER 2018 Research Track Slide Number 8 Methodology (1/5) • Hotel descriptions provided by GIATA • Digital information of over 364,000 hotels by 67 different tourist providers • Text example: “<strong>Verpflegungsarten:</strong>All Inclusive Ultra<br/> <br/> <strong>Lage:</strong> <P>Umgeben von Pinienw&#228;ldern, direkt am kilometerlangen, breiten Sandstrand von Belek gelegen. Den kleinen Ort Kadriye erreichen Sie nach ca. 2,5km. Nach Belek sind es ca. 17km, nach Antalya mit zahlreichen Einkaufs- und Unterhaltungsm&#246glichkeiten etwa 35km…” Data Acquisition Qualitative Analysis Data Pre- Processing Hotel Assignment Evaluation
  • 9. ENTER 2018 Research Track Slide Number 9 Methodology (2/5) • Manual evaluation of 10 rand. selected hotels • 20 descriptions per hotel on average • Text length correlates with information gain • Different provider offer similar descriptions • “Templates” – Predefined structure of text • Observations substantiated by statistical analyses (lexical diversity - text length) Data Acquisition Qualitative Analysis Data Pre- Processing Hotel Assignment Evaluation
  • 10. ENTER 2018 Research Track Slide Number 10 Methodology (3/5) • Extraction of html-content & text encoding • Natural Language Processing • Tokenizing • Stopwords Removal • Stemming • Pruning • Word vector generation (TF-IDF) Data Acquisition Qualitative Analysis Data Pre- Processing Hotel Assignment Evaluation
  • 11. ENTER 2018 Research Track Slide Number 11 Methodology (4/5) • Mapping of hotel descriptions to Seven Factors using three approaches 1. Clustering 2. Classification 3. Dictionary based approach Data Acquisition Qualitative Analysis Data Pre- Processing Hotel Assignment Evaluation
  • 12. ENTER 2018 Research Track Slide Number 12 Methodology (5/5) • Training, validation and evaluation with labelled data set established by Austrian travel operator • Training & validation set: 371 hotels • Test set: 180 hotels Data Acquisition Qualitative Analysis Data Pre- Processing Hotel Assignment Evaluation
  • 13. ENTER 2018 Research Track Slide Number 13 1. Clustering (1/3) • Unsupervised learning, fully automated • Clustering method: K-Means • Similarity measure: Cosine similarity • Data: Training set with 371 hotels • Number of cluster: 6 (based on various clustering evaluation coefficients) Goal: Generation of disjoint clusters of hotel descriptions which reflect the Seven Factors
  • 14. ENTER 2018 Research Track Slide Number 14 1. Clustering (2/3) Distribution of Seven Factors Cluster Hotels 0 135 1 43 2 38 3 48 4 59 5 48
  • 15. ENTER 2018 Research Track Slide Number 15 1. Clustering (3/3) Distribution of travel operator Cluster Hotels 0 135 1 43 2 38 3 48 4 59 5 48
  • 16. ENTER 2018 Research Track Slide Number 16 2. Classification • Supervised learning • Classifier: Naive Bayes, KNN, Decision Tree • Validation: 10-fold cross validation • Data: Training set with 371 hotels Goal: Generation of seven models which can be allocated to the Seven Factors
  • 17. ENTER 2018 Research Track Slide Number 17 3. Dictionary • Identification of most important words by experts for all Seven Factors Goal: Classification with attributes of dictionaries Sunlover Strand, Swimmingpool, Wlan, Liege, Meer, Beach, inclusive, Internetzugang, Liegestuhl, Meer, Meerblick, Pool, Sonnenschirm, Sonnenterasse, Wifi Educational Buffet, Club, Halbpension, inclusive, Miniclub, Musik Independent Eigenregie, gemütlich, individuell, lokal, Zentrum Cultural Spa, Suite, superior, elegant, Carte, Bademantel, Mietsafe, modern, Wellnessbereich, Whirlpool Sportive Aktivität, Fitness, Fitnessraum, Sport, Tennis, Tischtennis, Tennisplatz, Golfplatz Riskseeker Club, Stadt, Unterhaltung Excapist Ruhig, gemütlich, Wellness, Park, Garten, Spa, Wellnessbereich
  • 18. ENTER 2018 Research Track Slide Number 18 Classifcation vs. Dictionary Validation of training set with 371 hotels 0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 70.00% 80.00% 90.00% 100.00% Sunlover Educational Independent Cultural Sportive Riskseeker Escapist Precision Dictionary Precision Classification
  • 19. ENTER 2018 Research Track Slide Number 19 Final Evaluation Evaluation of best approaches with independent test set 0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 70.00% 80.00% 90.00% 100.00% Sunlover Educational Independent Cultural Sportive Riskseeker Escapist Precision Test Set (180 Hotels) Precision Training Set (371 Hotels) Dict.: Sunlover Class.: Other factors
  • 20. ENTER 2018 Research Track Slide Number 20 Conclusion • Allocation of hotels to tourist profiles using textual data can be successfully implemented, dependent on targeted user group + Sunlover, Escapist, Cultural, Sportive - Educational, Independent, Riskseeker • Majority of designed models are capable of dealing with new hotel data • Recommendations based on hotel descriptions can be reasonable for recommender systems
  • 21. ENTER 2018 Research Track Slide Number 21 Thanks for your attention!
  • 22. ENTER 2018 Research Track Slide Number 22 References (1/3) • Aggarwal, C. C. & Zhai, C. (2012). A Survey of Text Clustering Algorithms. In Mining Text Data, pages 77–128. Springer-Verlag New York. • Bird, S., Klein, E. & Loper, E. (2009). Natural Language Processing with Python. O’Reilly Media, Inc. • Burke, R. & Ramezani, M. (2011). Matching Recommendation Technologies and Domains. In Recommender Systems Handbook, pages 367–386. Springer US. • Cohen, E. (1972). Toward a sociology of international tourism. Social Research, 39(1): 164–182. • Cosh, K. (2013). Text mining Wikipedia to discover alternative destinations. The 2013 10th International Joint Conference on Computer Science and Software Engineering (JCSSE), pages 43–48. • Gibson, H. & Yiannakis, A. (2002). Tourist roles: Needs and the lifecourse. Annals of Tourism Research, 29(2): 358–383. • Goldberg, L. R. (1999). The Structure of Phenotypic Personality Traits. American Psychologist, 48(1): 26–34. • Gupta, G. K. (2006). Introduction to Data Mining with Case Studies. Prentice-Hall of India Pvt.Ltd. • Hippner, H. & Rentzmann, R. (2006). Text mining. Informatik-Spektrum, 29(4): 287–290. • Johansson, V. (2008). Lexical diversity and lexical density in speech and writing: a developmental perspective. Working Papers of Department of Linguistics and Phonetics, Lund University, 53:61–79.
  • 23. ENTER 2018 Research Track Slide Number 23 References (2/3) • Lahlou, F. Z., Mountassir, A., Benbrahim, H., & Kassou, I. (2013). A Text Classification Based Method for Context Extraction from Online Reviews. 8th International Conference on Intelligent Systems: Theories and Applications (SITA). • Neidhardt, J., Schuster, R., Seyfang, L. & Werthner, H. (2014). Eliciting the users’ unknown preferences. In Proceedings of the 8th ACM Conference on Recommender systems, 309-312. ACM. • Neidhardt, J., Seyfang, L., Schuster, R. & Werthner, H. (2015). A picture-based approach to recommender systems. Information Technology & Tourism, 15(1): 49-69. • Neidhardt, J., & Werthner, H. (2017). Travellers and Their Joint Characteristics Within the Seven- Factor Model. In Information and Communication Technologies in Tourism 2017 (pp. 503-515). Springer. • Ricci, F., Rokach, L. & Shapira, B. (2011). Introduction to Recommender Systems Handbook. In Recommender Systems Handbook, pages 1-35. Springer US. • Schmunk, S., Höpken, W., Fuchs, M. & Lexhagen, M. (2014). Sentiment Analysis: Extracting Decision- Relevant Knowledge from UGC. Information and Communication Technologies in Tourism, 14(1): 253-265. • Sharma, Y., Bhatt, T. & Magon, R. (2015). A multi criteria review-based hotel recommendation system. 15th IEEE International Conference on Computer and Information Technology, pages 687– 691.
  • 24. ENTER 2018 Research Track Slide Number 24 References (3/3) • Weiss, S. M., Indurkhya, N. & Zhang, T. (2010). Fundamentals of Predictive Text Mining. Springer- Verlag London. • Werthner, H. & Klein, S. (1999). Information Technology and Tourism – A Challenging Relationship. Wien - New York: Springer-Verlag. • Werthner, H., Alzua-Sorzabal, A., Cantoni, L., Dickinger, A., Gretzel, U., Jannach, D., Neidhardt, J., Pröll, B., Ricci, F., Scaglione, M., Stangl, B., Stock, O. & Zanker, M. (2015). Information Technology & Tourism, 15(1). • Yiannakis, A. & Gibson, H. (1992). Roles tourists play. Annals of Tourism Research, 19(2): 287–303. • Xiang, Z., Du, Q., Ma, Y. & Fan, W. (2017). Assessing Reliability of Social Media Data: Lessons from Mining TripAdvisor Hotel Reviews. Information and Communication Technologies in Tourism, 17(1): 625–637.