The document discusses a study that aimed to understand how user-generated pictures on Flickr could be used to determine similarity between destinations for recommendation purposes. The researchers harvested pictures from 233 cities on Flickr and represented each city based on tags from the pictures. Four systems using different tag data were tested with users, with one system using random tags from pictures (System C) found to be most reliable. The results indicate it is possible to define similar destinations and recommend them based on analyzing tags from user-uploaded pictures on social media platforms like Flickr.
WordPress Websites for Engineers: Elevate Your Brand
Flickr Destinations Similarity
1. www.bournemouth.ac.uk
Harvesting User Generated Picture Information
To Understand Destination Similarity
Dr. Alessandro Inversini
School of Tourism
Bourbemouth University
Dr. Davide Eynard
Faculty of Informatics
Universitá della Svizzera italiana
linkedin.com/in/inversini
@beanbol
beanbol.com
ainversini@bournemouth.ac.uk
June 6th 2013
3. www.bournemouth.ac.uk
aim
To understand:
the importance of the user generated pictures in understanding the destination
similarity in order to lead to a possible recommendation of a destination to visit.
pippo
RecSys
Web2.0
Picture
7. www.bournemouth.ac.uk
• Tourists have technological needs during the all tourism goods
consumption process
• Advancements in technologies have made easier to take
picture & to share pictures.
What is happening with technologies and social media?
Gretzel et al., 2006
9. www.bournemouth.ac.uk
Web2.0 & Social Media
i) the web is conceived more as a public square where to connect and exchange
opinions instead of a library;
ii) the possibility of publishing contents has been widespread thanks to easy-to-use
websites and applications;
iii) the availability of large bandwidth connections makes possible a wider use of
multimedia, leading to good quality, interactive content provided by the users
themselves. (Cantoni and Tardini, 2009).
10. www.bournemouth.ac.uk
Web2.0 & Social Media
Social Media are: “media impressions created by consumers, typically informed by
relevant experience, and archived or shared online for easy access by other
impressionable consumers” (Blackshaw, 2006)
They represent “a mixture of fact and opinion, impression and sentiment, founded
and unfounded tidbits, experiences, and even rumor” (Blackshaw & Nazarro, 2006)
Social media are important as they help spread within the web the electronic Word
of Mouth. (Litvin, Goldsmith, & Pan, 2008)
11. www.bournemouth.ac.uk
Web2.0 & Social Media
One in two tourists view destination’s photos via UGC in different web communities
Yoo and Gretzel (2009).
For example to understand culture (Pengiran-Kaha et al., 2010) or to recommend
a place to visit (Linanza et al., 2011).
According to Xiang and Gretzel (2010) social media are playing a relevant role
within travel and tourism search online.
Image and video sharing website count for 3.8% (Inversini and Cantoni 2011).
15. www.bournemouth.ac.uk
Flickr.com
Tags: terms used for describing the picture
Geotags: descrption of the location of the picture
Folksonomies and Personomies
The term folksonomy was introduced by Vander Wal
(2004), by mixing the terms “folk” and “taxonomy”.
Users assign a set of terms called tags to an
individual piece of content in order to group or
classify it for retrieval (Sturtz, 2004).
The collection of the tags of a single user is called
personomy, while the collection of personomies is
called folksonomy (Hotho et al., 2006)
17. www.bournemouth.ac.uk
Recommendation System
Collaborative filtering, which aggregates data about user preferences (i.e. ratings) to recommend
new products. In the specific case of tourism destinations, this would require users to (i) visit a
destination and (ii) explicitly provide a rating for it.
Content based filtering (Pazzani & Billsus 2007) mainly exploits user preferences (implicit or
explicit) to build a model of user’s interests. For the recommendation of tourism destinations,
this would require users to express their preferences (either by booking flights or rooms in
different destinations or by explicitly “liking” them). Moreover, a representation of destinations
rich enough to distinguish between what the user liked and what she did not would be necessary.
The knowledge-based approach uses knowledge about users and the
application domain to reason over product similarity and choose which
ones to recommend. In the field of tourism, this would mean finding a
metric which exploits external knowledge to define similarity between
destinations.
(Lorenzi, Ricci, Tostes and Brasil, 2005)
18. www.bournemouth.ac.uk
So??
What the importance of the user generated pictures in understanding the destination
similarity in order to lead to a possible recommendation of a destination to visit?
19. www.bournemouth.ac.uk
• Harvest 233 cities in Flickr.com*
– Each city was represented by the collection of all the tags assigned to its
pictures
– Information about users (who upload a the given picture)
– Information about the pictures (picture sharing the same tag)
– Geotags: harvested and used to disambiguate
• 4 sets of data
– Top 100 tags (Flickr API picture only tags – System A)
– Top 100 tags (Flickr API users information – System B)
– “Random tags”(YQL only picture tags – System C)
– “Random tags” (YQL users information – System D)
method
*http://www.euromonitor.com/Top_150_City_Destinations_London_Leads_the_Way (2007 and 2008)
20. www.bournemouth.ac.uk
• Vector Space Model was used to represent the
cities in terms of their tags.
• Normalize sets (e.g.)
• Calculate similarity
similarity(a, b)= cos(a) =
a×b
a b
IDFUi, j=
Pj
Pi, j
Ui, j
Uj
method
21. www.bournemouth.ac.uk
method
• Submit sets to a sample of users
51 users
296 valid observations
- 47% 25-30 years old
- 45% italian
- 50.9% expert travellers*
* travelled 5-10 times the previous year
22. www.bournemouth.ac.uk
Results
• System C was the more reliable for users
• 37,5 % choices given with of confidence
– Highest level of confidence for system C
– Lowest level of confidence for system A
24. www.bournemouth.ac.uk
Discussion & Conclusion
• It is possible to define similar destinations on the basis of
pictures images tags.
• Flickr APIs are not enough for defining destinations’
similarity (SystemC vs SystemA)
BUT
• Information about pictures are enough for defining
destination similarity.
• IT IS THEREFORE POSSIBLE to recommend destinations
on the basis of the pictures uploaded on social media.
25. www.bournemouth.ac.uk
Harvesting User Generated Picture Information
To Understand Destination Similarity
Dr. Alessandro Inversini
School of Tourism
Bourbemouth University
Dr. Davide Eynard
Faculty of Informatics
Universitá della Svizzera italiana
linkedin.com/in/inversini
@beanbol
beanbol.com
ainversini@bournemouth.ac.uk
May 22nd 2013
Notes de l'éditeur
Markwell (1997) argues that even the stereotype of tourists carrying big cameras, lenses and tripod is in some way a sign of the ineluctable relationship between tourism and photography. Actually, while on the one hand photographic representations of destinations and tourism attractions are there to inspire a tourist’s visit to a destination on the other hand taking pictures represent the major focus of activity for the tourist (Jenkins, 2003).
Moreover as described in Garrod (2009) there is a strong correspondence between images promoted by tourism industry and those taken by tourists. The research of Garrod (2009) confirmed the study of one other author: Urry (1990) highlighted the iconic status of the scenery and of the things represented in tourism photography (e.g. the UK red telephone box) highlighting the facts that one the one hand tourism seems to be essentially about virtually “consuming palaces” (i.e. before visiting them) and on the other hand the importance of photography for travelers (i.e. after visiting a place) is essentially related to the demonstration of the actual trip: traveler shows their own version of the (iconic) place they have seen before the trip.
At the destination level, image has an influential aspect in supporting travelers’ decision-making process (Choi, 2007) and even if brochures may have been the predominant vehicle of tourist imaginaries (Hunter, 2008), nowadays websites and new technologies are also incorporating hedonic messages, as underlined by Gretzel and Fesenmaier in 2003 which stated that new technologies to be effective need to incorporate sensory information
As travel and tourism are experience-based activities (e.g. Tussyadiah & Fesenmaier, 2008) such experiences need to be communicated. Communities, blogs, travel review websites and social media in general offer publication outlets to help information sharing among users (Arsal et al., 2008). These websites increasingly gain substantial popularity in online travellers’ use of the internet (Gretzel, 2006; Pan et al., 2007).web2.0 does not provide any new protocol or completely new technologies (although a range of related technologies has been developed around it, like Ajax). It represents mainly a different use of the web itself, characterized by different expectations, goals and practices (Kolbitsch and Maurer, 2006): (i) the web is conceived more as a public square where to connect and exchange opinions instead of a library, (ii) the possibility of publishing contents has been widespread thanks to easy-to-use websites and applications and (iii) the availability of large bandwidth connections makes possible a wider use of multimedia, leading to good quality, interactive content provided by the users themselves (Cantoni and Tardini, 2009).
Social media, are playing an increasingly important role as information sources for travelers as they increasingly appear in search engine results in the context of travel-related searches (Hays, Page, Buhalis, 2012). Social media constitute a substantial part of the search results and therefore traditional providers of travel-related information will have to ensure that they include social media in their online marketing (Xiang and Gretzel, 2010). Looking forward, successful tourism organisations will increasingly need to rapidly identify consumer needs and to interact with prospective clients by using online, comprehensive, personalised and up-to-date communication media for the design of products that satisfy tourism demand.
Flickr.com is considered the most relevant and popular image sharing social network (Alexa rank 34, US rank 26, enjoying 730’248 link in to date). Moreover, within social media, user generated travel pictures are carrying a lot of information because on the one hand they are often described by sets of small terms called “tags” (which once collected among the users within a system constitute a folksonomy) and on the other hand they often represent places within a map (so they are geo-located).
Among the three approaches, collaborative and content-based filtering usually require a huge number of training examples to work properly. This is the reason of the so-called bootstrapping problem, that is the problem of providing good recommendations without already having a strong user base. The approach described in this paper, which relies on external knowledge (the one extracted from tags applied to geolocated photos), is a knowledge-based one, and as such it might represent a possible solution to the problem of recommending destinations without incurring in the bootstrap problem.
In other words the study investigates the hidden power behind user generated pictures and their information (i.e. pictures information and/or users’ information) leading to the suggestion of a possible new touristic experience without knowing anything of the travelers but a single (or a collection) of pictures. Therefore the research questions are: RQ1: Is it possible to use User Generated Pictures to find similar destination?RQ2: What information is more relevant to determine similar destinations?RQ2.1: Is information about pictures more relevant to determine similar destinations?RQ2.2: Is information about users more relevant to determine similar destinations?RQ3: Is it possible to suggest to visit a destinations based on personal pictures?
Flickr.com easily allows users to get the top tags for a given location (specified as a WOEID) through its flickr.places.tagsForPlace API. However, this API only returns the top 100 unique tags, without any information about the photos taken or the users who uploaded them. For this reason, two distinct datasets were built: The former, called Top100, contains the tags retrieved using the aforementioned API; the latter, called Random, contains a random sampling of photo metadata obtained by querying Flickr APIs with YQL (Yahoo Query Language: an expressive SQL-like language that lets developers query, filter, and join data across Web services.) Selecting, for each city, 10 photos from 300 random days, taken at random hours, avoids bias due to day- or time-related events. An important advantage of this second approach is that user- and photo-related information is available, providing new dimensions across which tag analysis can be performed.
VSM: The VSM (Vector Space Model) was used to represent cities in terms of their related tags. In the VSM each city is represented by a vector in an n-dimensional space, where n is the number of distinct tags, whose components are weighted according to tag frequency.Of course, as tags which are too popular tend to be more widespread and thus less informative, frequency is normalized using the TF-IDF approach, which normalizes a TF (Term Frequency) with an IDF (Inverse Document Frequency) factor. This takes into account the number of documents containing that term (in our case, the number of cities for whose photos a given tag has been used).
Each user judged one (random) city at a time, for a total of twenty distinct cities. For each of them the user was provided results of the four scoring systems, in the form of four lists containing the top-five related cities. After users’ choice of the similarity system the interface posed two questions in order to (i) assess the confidence level of the respondent within the association made (measured on a 1-5 Likert Scale) and assess the willingness of the respondent to recommend the selected cities belonging to the chosen system to someone that visited the main city.In the front end the sections spaces have been allocated to: (1) information about the place taken from Yahoo! GeoPlanet (http://developer.yahoo.com/geo/geoplanet/), which was also used to disambiguate the given city from its homonyms (e.g. in the case or Rome it is possible to have a conflict between Rome in Italy and Rome in Georgia); (2) Map from Yahoo! maps; (3) Instruction on how to complete the survey (a redundancy with the first part of the survey was desired because every users was asked to complete 15 cities); (4) the four systems to be rated (differently from previous experiences the font size was formatted in the same way in order not to influence responders). Since the similarity between two places is based on the similarity of their tag descriptions, users had the possibility to check the tags in common between two cities simply clicking on the name of the city. This is clearly a simplification, as the adopted similarity metrics were not based on a simple term match. It was considered useful to provide a rough idea of why two cities had been considered similar.
The best system according to users (Figure 2) is system C (n=92 preferences). System C was the random dataset based on picture information (IDFP). Then system B (n=66 preferences) the random dataset, standard IDF along with System D (n=63 preferences) the random dataset based on picture users (IDFU). The last system was system A (n=53 preferences) created with knowledge gathered by the Top100 dataset. Non valid answers (N) were only 22.37,5 % of the choices have been given with an high or fairly good degree of confidence (Likert scale 4 and 5) , while 34,8% of the choices have been selected with a low or fairly low degree of confidence (Likert scale 1 and 2).
Furthermore, it was asked to the users if they were willing to recommend the selected cities within the selected system to someone that visited the main city: what is noticeable is that aggregating the average, fairly good and high degree of choice confidence (Likert scale values 3, 4 and 5 – 65,2% of the observations) it is possible to define a degree of recommendation of 56.7% (Figure 3).