With so much content on the web this paper aims to answer the question how users can get the most out of their web searches with regards to user-generated content. The system in which we use for web searches can be modified or optimized for better web search result for users. The main goal of this literature review is to look into the research on web search optimization mainly focusing on research in the areas of tags, algorithms, and URLs. These sources on web search show that it has the ability to be enhanced via a few different avenues. With enhanced web search capabilities this would allow users to gather better results tailored to them specifically.
1. Can Web Search Be Enhanced For User-Generated
Content?
Evan Atkinson
Elon University
2434 W. Webb Ave
Apt. 205
Burlington, NC
1(862) 579-7039
eatkinson2@elon.edu
ABSTRACT
With so much content on the web this paper aims to answer the
question how users can get the most out of their web searches
with regards to user-generated content. The system in which we
use for web searches can be modified or optimized for better web
search result for users. The main goal of this literature review is
to look into the research on web search optimization mainly
focusing on research in the areas of tags, algorithms, and URLs.
These sources on web search show that it has the ability to be
enhanced via a few different avenues. With enhanced web
search capabilities this would allow users to gather better results
tailored to them specifically.
Keywords
Algorithms, URLs, tags, tag cloud, folksonomy, user-generated
content
1. INTRODUCTION
To understand user-generated content, one must
explain the concept of social media. Social media has been one
of the biggest technological booms in recent years to date. It is
evident that social media has a strong hold on the average
Internet consumer today. According to Forrester Research, 75
percent of Internet users use social media as of 2008, which was
a 56 percent increase from the year before (Kaplan & Haenlein
2010). Social media is not a new concept; however, people have
always longed for different platforms to connect with one
another. Each new way to communicate was revolutionary.
Kaplan & Haenlein (2010) state that in 1998 Bruce and
Susan Abelson founded “Open Diary,” an early social media site
that connected online diary writers. Even though the term social
media is huge today, these social media sites have been around
longer than most people realize. People are always interested in
communicating with others, which is why the printing press and
telephones were so radical when they were released to the
public.
Blogs were the first forms of social media and today
there are over 100 million blogs available online (Kietzmann,
Hermkens, McCarthy, & Silvestre 2011). People have been
debating different ways to access these blogs for users
consumption, and there are search engines specifically targeting
blogs such as Technorati. Information such as this is very
valuable to many online users, but can not be accessed through
regular web search engines. With the rise of micro-blogging
through Twitter, a lot of this information can be vital in searches.
According to Kietzmann et al. (2011), there are seven
functional blocks of social media: Identity, Presence,
Relationships, Sharing, Reputation, Groups, and Conversations.
Each block does not need to be present in each social media
platform, but these blocks can be used to analyze each platform
in a more thorough manner.
Rainie & Wellman (2012) reference the triple
revolution, which they propose is currently going on. The Social
Network, Internet, and Mobile Phone revolutions are coming
together at the same time. With this boom in all three platforms,
it has become much easier for people to create content, and
connect with one another. The Social Network revolution has
provided opportunities for people to reach beyond their
individual tight knit worlds. The Internet revolution has given
people the power to communicate, and the power to access an
insurmountable amount of information. The Mobile revolution
has given people the ability to use powerful technology devices
wherever they go.
Through the understanding of social media, we can
now delve into the subject of UGC and web search optimization.
Web searches have always been a way for users to find results on
the web for a wide range of topics, but in recent years the web
has grown tremendously and a lot has to do with user-generated
content. The question asked earlier is if web searches can be
enhanced to aid in search results for user-generated content
(UGC).
2. Discussion
2.1 User-Generated Content
There is no standard definition of UGC, but according to
Balasubramiam (2009) the Organisation for Economic Co-
operation and Development (OECD) defines UGC as the
following:
i) A content which is made publicly available, through
internet
ii) Boasting a certain level of creativity and maybe the
most important point
iii) Contents created outside of professional practices.
Millions of people contribute to this UGC online culture that has
grown in recent years due to certain websites such as Delicious,
Wikipedia, and YouTube, which all have very different platforms
for users to generate content on.
For example, Wikipedia has more than two million
articles created by users in English alone, and it is one of the
fastest growing collaborative content outlets in the world (Nov,
2007). Balasubramiam (2009) also states that people
contributing user-generated content are looking to connect with
people, as self-expression and as well as to receive recognition or
prestige for their work. While Shirky (2008) thinks, “ This
desire to make a meaningful contribution where we can is part of
what drives Wikipedia’s spontaneous division of labor.”
UGC is re-shaping the way people use the Internet, and
it is creating new social interactions, and giving users the power
to be more creative, along with being able to develop different
business opportunities or marketing. UGC has created, a huge
impact in the video viewing world on the Internet with the
creation of YouTube. Constant waves of new videos being
created because of the Web have made watching videos a quick
personal viewing experience, leading to a great variability in
user-behavior and attention span (Cha, Kwak, Rodriquez, Ahn,
& Moon 2007). UGC of this nature has greatly affected the
strategies for marketing, recommendation, and search engines.
2. There may be many different reasons for users to want
to generate content and ideas on the Internet, but that is not the
main focus. The main focus is strictly on if UGC can be more
searchable in web searches. Whether users enjoy these new
collaborative platforms for expressing themselves or to practice
their skills and receive feedback, millions of users are creating
this content, and it needs to be accessed through web search.
2.2 Social Media
Social networking sites also create a space for users to generate
content. With the rise of Twitter in recent years, it has become a
place for users to micro-blog and to upload important
information. Twitter has been known to have breaking news,
real-time content, and popular trends on a national and global
scale. This information could be quite interesting and important
to users who, for example, are searching on a news topic in a
web search. According to Teevan, Ramage, & Morris (2011),
users reported the biggest factor in searching Twitter was to find
timely information, yet these results do not yield in web
searches. Teeven et al. (2011) reports almost 50 percent of the
searches have to do with the news, or news trends. Showing that
users are searching for informational content on social media
UGC sites.
With the popularity of Facebook the way people
connected has never been the same. I believe these forums of
idea sharing, if garnered the right momentum, could completely
change the way people gather “important information”, which
would be filtered for them and potentially could change the way
people “surf” the Internet. UGC is growing rapidly, and users of
the web need to be able to access all pertinent knowledge on
their search topics, even if it is UGC and not contemporary
search results.
To gain access to UGC and other information through
web searches, researchers have come up with a few different
solutions that can be grouped into categories. The main ways
researchers have focused on web search optimization have been
to use tags, URLs, and/or algorithms.
2.3 Tags
One of the most complicated challenges in navigating
this new user-generated world is how to organize relevant
information. In the world of UGC, bookmarking is one of the
most popular ways information is being organized. The rise in
popularity of websites, such as Delicious, has made this use of
tags very apparent in online culture. Delicious is considered one
of the first successful social bookmarking sites.
Golder & Huberman (2006) state that bookmarks are
useful because they can be accessible from any computer, not just
the users own browser. Each bookmark records the web page’s
URL and its title, as well as the time at which the bookmark was
created. Users can also choose a tag or multiple tags for each
bookmark of their choice.
According to Sinclair & Cardew-Hall (2008), tagging
services allow a participant to associate freely determined
keywords (‘tags’) with a particular resource. Tagging services
exist to tag an enormous variety of things such as photographs,
URLs, podcasts, computer games, music and videos. The dataset
arising from all the participants’ tags is commonly referred to as
a ‘folksonomy’.
Thomas Vander Wal, who coined the term folksonomy says,
however that folksonomy is not collaborative but is the result of
personal free tagging of information for one’s own retrieval.
However, tagging in itself is not collaborative, but it
leads to a collaborative function on the web that can be used to
aid in web search capabilities. Accessing these tags, or meta-
data, can be extremely vital in users’ search engine experiences.
Tags have many different abilities and with UGC creating new
forums of creativity, bookmarking sites have flourished. Golder
& Huberman (2006) identify several functions tags that perform
with bookmarks: Identify what or who it is about, what it is, who
owns it, refining categories, identifying qualities or
characteristics, self reference and task organizing.
As reported by Golder & Huberman (2006), tags have
many different functional aspects, and have been used
successfully to aid users in organizing with bookmarks. The
study also focuses on finding regularities in user activity, and tag
frequencies. The results showed that after the first 100 or so
bookmarks added to a specific tag, each tag’s frequency is nearly
fixed proportion of the total frequency of all tags used. The
results also showed that this stability often appears fewer than
100 bookmarks, which shows a URL does not need to be become
very popular for the tag data to be useful.
With regards to the Delicious interface, users can add
bookmarks, but tags already used by people who already tagged
that URL can also be seen. This causes many users to imitate the
tag selection already used, which can lead to a consensus in
common tags used for a URL. In this way, Delicious is not
directly a recommendation system, but through popular tags
sending users in a certain direction it acts in such a way.
Bischoff, Firan, Nejdl, & Paiu (2008) say certain
motivations for users to tag include organizational motivations
for tagging, opinion expression, attraction of attention, self-
presentation or providing context to friends. Results show in a
free-for-all system, opinion expression, self-presentation, and
activism seem to be major motivating factors while in self-
tagging systems, such as Delicious or Flickr, users tag
motivations seem to predominantly be for their own benefits like
enhancement of information organization.
Results also show that tags in bookmarking systems for
the most part provide a good summary of the web page they are
tagged to, and they can indicate the popularity of a page. With
this as the case, accessing popular information through web
search is vital. Successful tags would yield better results for
users using web searching. Guy, Zwerdling, Ronen, Carmel, &
Uziel (2010) focused on different types of tag recommendation
engines, which included a people-based recommender (PBR); a
tags-based recommender (TBR); two types of a hybrid
recommender (PTBR): a combination of people or tags (or-
PTBR), and a combination of people and tags (and-PTBR,
suggesting only items related to both people and tags); and a
popularity-based recommender (POPBR).
Guy et al. (2010) results showed that the combination
of incoming tags and used tags is the most effective in
representing a user’s topics of interest. Recommendations based
on a TBR, with a tag profile that combines incoming and used
tags, are rated significantly more interesting than the most
effective PBR studied.
Recommended items are shown to be highly different
between the PBR and the TBR, with less than 2 percent overlap.
A hybrid PTBR recommender including explanations improves
the results slightly further, leading to an over 70:30 ratio between
interesting and non-interesting items. It also presents other
potential benefits over a TBR, such as a lower percentage of
already known items and higher diversity of item types (Guy et
al. 2010).
3. With this data it can be shown that there are clear
benefits in different types of tag recommendation engines. If the
correct tag recommendation engine is employed then combining a
system with a tag-based engine that could access UGC and other
meta-data can enhance web search capabilities.
With so many tags populating the Internet, a tag-cloud
system has also been created. Tag clouds are a new way to find
and visualize information that can be accessed in one click.
Through research on tag-cloud search engine interfaces Trattner
et al’s (2012) results show that from the users’ perspective, both
tag-based browsing interfaces were perceived to be better to the
baseline interface (i.e. Google, Yahoo, Bing). The users
indicated that these interfaces provided significantly enhanced
support and reported significantly higher levels of confidence
that relevant information would be found. They also ranked both
tag-based browsing interfaces significantly higher overall than
the baseline interface.
Millen & Feinberg (2006) were interested in how
social tagging can improve social navigation. In the study, they
focused research on a service called Dogear. The dogear social
bookmarking service was designed to display bookmarks within
a navigation model that allows users to manage and explore the
collection in different ways. Results from Millen & Feinberg
(2006) confirm that social tags are used by a large number of the
users of the social bookmarking application under study.
Approximately 60 percent of the dogear service visitors explored
the bookmark collection using one or more of the pivot links
(tags or people). These tags are linked and based on the URLs
they are tagging. A few studies have researched into how useful
these URLs are for web search capabilities.
2.4 URLs
Kolay & Dasdan (2009) study shows that Delicious
URLs provide significant value to web search engines from the
perspective of content discovery and user search satisfaction. The
URLs lead to faster discovery of good quality content. Also,
when given to users in response to search results, the numbers of
them receiving clicks as well as the total number of clicks they
receive are significantly high.
Kandylas & Dasdan (2010) study instead focused on
the massive site Twitter. Twitter is being used to recommend
popular articles in real-time to track breaking news stories, for
work-related communication, or for brand marketing. Much of
this UGC would be of importance to users’ search results.
Kandylas & Dasdan (2010) results show that extracting
bitly URLs from Twitter can be useful for a web search engine.
The average URL quality is higher than that of a randomly
selected set. However, URL tweet count and lifetime provides
insights into some of the URLs, but is not enough to filter out a
large portion of bad or spam URLs. These results are still a
bright spot in web search optimization.
Wetzker, Zimmermann, & Bauckhage (2008) also
agrees with Kandylas & Dasdan’s (2010) conclusion that more
research needs to be done. Wetzker et al’s (2008) analysis on
social bookmarking systems showed that social bookmarking
provides a valuable source for information retrieval and social
data examination. However, the study found that spam could
highly distort any analysis.
Chen, Scripps, & Tan (2008) also found similar
evidence of this issue of spam, with results showing in the study
that social bookmarking is a rich domain for applying link
mining. Although, presenting interesting research problems such
as how to identify potential collusion between users or tag spam
in social bookmarking data. Spam has the ability to invade user-
generated sites, and the web as a whole and clearly more future
research needs to be done on how to better filter spam from user-
generated social media sites such as Twitter. Besides research
on how folksonomy and URLs can improve web search, there has
also been research done on improving algorithms for web search
development.
2.5 Algorithms
Bateman, Muller, & Freyne (2009) presented an
algorithm that has potential to improve the user experience using
the popular pivot browsing mechanism by improving bookmark
orderings, thus reducing the number of query term refinements
needed to find a bookmark of interest.
Bogers & Van Den Bosch (2011) also studied this topic
with focus on algorithmic improvements, that approach the use of
tag overlap and metadata which provide better results for social
bookmarking data sets than the transaction patterns that are used
traditionally in recommender systems research. They find that
fusing recommendations can indeed produce significant
improvements in recommendation accuracy. Also finding that it
is often better to combine approaches that use different data
representations, such as tags and metadata, than to combine
approaches that only vary in the algorithms they use. The best
results are obtained when both of these aspects of the
recommendation task are varied in the fusion process.
Heymann, Koutrika, & Garcia-Molina (2008) found
through their study on tags, URLs, and social bookmarking that
social bookmarking as a data source for search has URLs that are
often actively updated and prominent in search results. The study
also found that tags were overwhelmingly relevant and objective.
Bao et al. (2007) also believes through social annotation that web
search can be optimized and improved. This study focused on
optimizing web search through the aspects of similarity ranking
and static ranking. The results showed that social annotations
provided not only a summary, but also a good indicator of the
quality of web pages. Social annotations could benefit web
search in both similarity ranking and static ranking. The results
showed that similarity ranking can successfully find the relations
among annotations and static ranking can provide information
from the web annotators’ perspective.
3. Conclusion
Either through tag, URL, or algorithmic improvements,
web search optimization is vital for many reasons. Users of web
search have the desire to be able to access any and all
information that is relevant to their search topics. If social
bookmarking continues to grow at the rate it has over the past
several years then it will quickly reach the scale of the current
web. These tags will have a large impact on improving the
quality of web search.
The studies presented provide a basis for future
research on web search optimizations using tags, URLs, and
algorithm for improvements. UGC has proliferated out virtual
worlds, and many users have a want and need to access this
information. Through future work improvements in some
fundamental aspects of web search, these golden nuggets of UGC
will be easily, and more readily accessed.
4. REFERENCES
[1] Balasubramaniam, N. (2009). User-generated
content. Business Aspects of the Internet of Things, 28
[2] Bao, S., Xue, G., Wu, X., Yu, Y., Fei, B., & Su, Z. (2007,
May). Optimizing web search using social annotations.
In Proceedings of the 16th international conference on
World Wide Web (pp. 501-510). ACM. Doi:
10.1145/1242572.1242640
4. [3] Bateman, S., Muller, M. J., & Freyne, J. (2009, May).
Personalized retrieval in social bookmarking.
In Proceedings of the ACM 2009 international conference
on Supporting group work (pp. 91-94). ACM. Doi:
10.1145/1531674.1531688
[4] Bischoff, K., Firan, C. S., Nejdl, W., & Paiu, R. (2008,
October). Can all tags be used for search?. In Proceedings
of the 17th ACM conference on Information and knowledge
management (pp. 193-202). ACM. Doi:
10.1145/1458082.1458112
[5] Bogers, T., & Van Den Bosch, A. (2011). Fusing
recommendations for social bookmarking web
sites. International Journal of Electronic Commerce, 15(3),
31-72. Doi: 10.2753/JEC1086-4415150303
[6] Cha, M., Kwak, H., Rodriguez, P., Ahn, Y. Y., & Moon, S.
(2007, October). I tube, you tube, everybody tubes:
analyzing the world's largest user generated content video
system. In Proceedings of the 7th ACM SIGCOMM
conference on Internet measurement (pp. 1-14). ACM. Doi:
10.1145/1298306.1298309
[7] Chen, F., Scripps, J., & Tan, P. N. (2008, December). Link
mining for a social bookmarking web site. In Web
Intelligence and Intelligent Agent Technology, 2008. WI-
IAT'08. IEEE/WIC/ACM International Conference on (Vol.
1, pp. 169-175). IEEE. Doi: 10.1109/WIIAT.2008.369
[8] Golder, S. A., & Huberman, B. A. (2006). Usage patterns of
collaborative tagging systems. Journal of information
science, 32(2), 198-208. Doi: 10.1177/0165551506062337
[9] Guy, I., Zwerdling, N., Ronen, I., Carmel, D., & Uziel, E.
(2010, July). Social media recommendation based on people
and tags. In Proceedings of the 33rd international ACM
SIGIR conference on Research and development in
information retrieval (pp. 194-201). ACM. Doi:
10.1145/1835449.1835484
[10] Heymann, P., Koutrika, G., & Garcia-Molina, H. (2008,
February). Can social bookmarking improve web search?.
In Proceedings of the international conference on Web
search and web data mining (pp. 195-206). ACM. Doi:
10.1145/1341531.1341558
[11] Kandylas, V., & Dasdan, A. (2010, April). The utility of
tweeted URLs for web search. In Proceedings of the 19th
international conference on World wide web(pp. 1127-
1128). ACM. Doi: 10.1145/1772690.1772837
[12] Kaplan, A. M., & Haenlein, M. (2010). Users of the world,
unite! The challenges and opportunities of Social
Media. Business horizons, 53(1), 59-68. Doi:
10.1016/j.bushor.2009.09.003
[13] Kietzmann, J. H., Hermkens, K., McCarthy, I. P., &
Silvestre, B. S. (2011). Social media? Get serious!
Understanding the functional building blocks of social
media. Business Horizons, 54(3), 241-251. Doi:
10.1016/j.bushor.2011.01.005
[14] Kolay, S., & Dasdan, A. (2009, April). The value of socially
tagged urls for a search engine. In Proceedings of the 18th
international conference on World wide web (pp. 1203-
1204). ACM. Doi: 10.1145/1526709.1526929
[15] Millen, D. R., & Feinberg, J. (2006, June). Using social
tagging to improve social navigation. In Workshop on the
Social Navigation and Community based Adaptation
Technologies.
[16] Nov, O. (2007). What motivates
wikipedians? Communications of the ACM,50(11), 60-64.
[17] Rainie, H., Rainie, L., & Wellman, B. (2012). Networked:
The new social operating system. The MIT Press.
[18] Shirky, C. (2008). Here comes everybody: The power of
organizing without organizations. Penguin.
[19] Sinclair, J., & Cardew-Hall, M. (2008). The folksonomy tag
cloud: when is it useful?. Journal of Information
Science, 34(1), 15-29. Doi: 10.1177/0165551506078083
[20] Spiteri, L. F. (2013). The structure and form of folksonomy
tags: The road to the public library catalog. Information
technology and libraries, 26(3), 13-25. Doi:
10.6017/ital.v26i3.3272
[21] Teevan, J., Ramage, D., & Morris, M. R. (2011, February).
# TwitterSearch: a comparison of microblog search and web
search. In Proceedings of the fourth ACM international
conference on Web search and data mining (pp. 35-44).
ACM. Doi: 10.1145/1935826.1935842
[22] Trattner, C., Lin, Y. L., Parra, D., Yue, Z., Real, W., &
Brusilovsky, P. (2012, June). Evaluating tag-based
information access in image collections. InProceedings of
the 23rd ACM conference on Hypertext and social
media (pp. 113-122). ACM. Doi:
10.1145/2309996.2310016
[23] Wetzker, R., Zimmermann, C., & Bauckhage, C. (2008,
July). Analyzing social bookmarking systems: A del. icio. us
cookbook. In Proceedings of the ECAI 2008 Mining Social
Data Workshop (pp. 26-30).