2. Outline
What is collective intelligence?
B i technical concepts behind collective intelligence
Basic h i l b hi d ll i i lli
Many forms of user interaction
Example of how user interaction is converted into
intelligence
3. Web users are undergoing a transformation…
Users are expressing themselves. This expression may
be in the form of:
sharing their opinions on a product or a service through
reviews or comments; through sharing and tagging
content; through participation in an online community; or
by contributing new content.
This increased user interaction and participation gives rise
to data that can be converted into intelligence in your
application. The use of intelligence to personalize a site for
a user, to aid him in searching and making decisions, and
to make the application more sticky are cherished goals
that web applications try to f lf ll
h b l fulfill.
4. Wisdom of the Crowd
“Under the right circumstances, groups are extraordinarily
intelligent, and are often smarter than the smartest people
in them ”
them.
“If the process is sound, the more people you involve in
solving a problem, the better the result will be.”
problem be
A crowd’s collective intelligence will produce better results
than those of a small group of experts if four basic
conditions are met.
5. The “wise crowds” are valuable when they’re
composed of individuals who…
Have diverse opinions;
Wh
When the i di id l aren’t afraid to express their
h individuals ’ f id h i
opinions;
When there’s diversity in the crowd; and
h h ’ d h d d
When there’s a way to aggregate all the information and
use it in the decision-making process.
6. Collective intelligence
To effectively use the information provided by others to
improve one’s application.
When a group of individuals collaborate or compete with
each other, intelligence or behavior that otherwise
didn t
didn’t exist suddenly emerges
7. A user may be influenced by other users either directly
or through intelligence derived from the applications by
mining the data.
8. Collective intelligence of users is
The intelligence that’s extracted out from the collective
set of interactions and contributions made by users.
The use of this intelligence to act as a filter for what’s
valuable in your application for a user
—This filter takes into account a user’s preferences and
interactions to provide relevant information to the user.
Th
There are a huge number of ways this information can
h b f thi i f ti
be processed and interpreted
9. To apply collective intelligence in your
application. You need to
1. allow users to interact with your site and with each
other, learning about each user through their
interactions and contributions
contributions.
2. aggregate what you learn about your users and their
contributions using some useful models
models.
3. leverage those models to recommend relevant content
to a user
user.
10. Three components to harnessing intelligence:
1 – Allow users to interact, 2 – Learn about your users in
aggregate,
aggregate 3 – Personalise content using user
interactions data and aggregate data.
11. Sources of information
Content-based—
based on information about the item itself, usually
keywords or phrases occurring in the item.
k d h i i th it
Collaborative-based—
based on the interactions of users.
12. Algorithms for applying Intelligence
correlate users with content and with each other,
need a common language to compute relevance between
items, b t
it between users, and between users and items.
db t d it
Content-based relevance is anchored in the content
itself… (i f
it lf (information retrieval systems)
ti t i l t )
Collaborative-based relevance leverages the user
interaction data to d t t meaningful relationships
i t ti d t t detect i f l l ti hi
Unstructured text: to understand how metadata can be
developed f
d l d from unstructured text
t t dt t
13. Abstracting types of content
applications
li ti
users and items
I
Items ?
social-networking:
user is also a type of
l f
item
Metadata professionally developed keywords, user-generated tags,
keywords extracted by an algorithm after analyzing the text,
ratings, popularity ranking etc.
Profile based
Profile-based and user-action based data
user action
Metadata as a set of attributes that help qualify an item.
14. Sources for generating metadata about an item
users and items having an associated vector of
metadata attributes.
The similarity or relevance between two users or two
items or a user and item measured by looking at the
similarity between the two vectors
vectors.
17. Generating intelligence
Content-based analysis and collaborative filtering
to build a representation for the content
• Terms or phrases
• Terms are converted into their basic form by a process known as
stemming. Terms with their associated weights (term vectors), then
represent the metadata associated with the text. Similarity between
two content items is measured by measuring the similarity associated
with their term vectors.
to use the information provided by the interactions of
users to predict items of interest for a user
• to match a user’s metadata to that of other similar users and
recommend items liked by them (Language independent methods)
• E.g. users rate items, so CF approach find patterns in the way items
have been rated by the user and other users to find additional items of
interest for a user
• Amazon, Netflix, and Google
18. Collaborative filtering
Memory-based and model-based
a similarity measure is used to find similar users and then
make a prediction using a weighted average of the ratings
of the similar users
to build a model for prediction using a variety of
approaches: linear algebra, probabilistic methods, neural
networks, clustering, latent classes, and so on
A collaborative filtering algorithm usually works by
g g y y
searching a large group of people and finding a smaller set
with tastes similar to yours.
Collecting Preferences
Recommending It
R di Items
Matching Products
Item-Based Filtering
19. Harnessing Collective Intelligence to transform
from content-centric to user-centric applications
Prior to the user-centric revolution, many applications put
little emphasis on the user. These applications, known as
content-centric applications,
content centric applications focused on the best way to
present the content and were generally static from user
to user and from day to day.
User-centric applications leverage Collective Intelligence
to fundamentally change how the user interacts with the
Web application
application.
User-centric applications make the user the center of the
web experience and dynamically reshuffle the content
based on what’s known about the user and what the user
explicitly asks for.
20. User-centric applications are composed of the
following four components
Core competency: The main reason
why a user comes to the application.
Community: Connecting users with
other users of interest, social
networking, finding other users who may
p
provide answers to a user’s q questions.
Leveraging user-generated content:
Incorporating generated content and
interactions of users to provide additional
content to users.
Building a marketplace: Monetizing
the application by product and/or service
placements and showing relevant
advertisements.
23. Concept of a dataset
Densely populated dataset
• It has more rows than columns
• The dataset is richly populated
Clustering & build a predictive model
E
E.g. similar users according to age and/or sex might be a
i il di d/ i h b
good predictor of the number of minutes a user will spend
on the site
• age a good predictor
• the number of minutes spent is inversely proportional to the
age
• a simple linear model
• minutes spent = 50 – age of user
24. Concept of a dataset
• Set of users viewed any of the videos on
our site within the timeframe
High-dimensional, sparsely populated
a generalization of th term vector representation
li ti f the t t t ti
This representation is useful to find similar users and is
known as the User Item matrix
users are represented as rows
the total number of videos represented as columns
Properties: more rows than columns, richly populated
25. Users are represented as
columns
the videos as rows
Users who have viewed this
video have also viewed
these other videos
Properties
number of columns is large,
sparsely populated with
nonzero entries in a few
columns
multidimensional vector
26. Forms of user interaction
need to quantify the quality of the interaction
R i
Rating and voting interaction
d i i i
explicit in the user’s intent
way of getting feedback on how well the user liked the
item
is quantifiable and can be used directly
Voting is similar to rating. However, a vote can have only
g g , y
two values—1 for a positive vote and -1 for a negative
vote
interactions such as using clicks are noisy—the intent of
the user isn’t known perfectly and is implicit
27. Persistence of
ratings
Entities:
User & Items
user_item_rating is a mapping table that has a
composite key, consisting of the user ID and
content ID
The cardinality between the entities show that
• Each user may rate 0 or more items.
• Each rating is associated with only one user.
• An item may contain 0 or more ratings.
• Each rating is associated with only one item.
digg.com, allows users
to contribute and vote
What are the top 10 rated items? on interesting articles
28. Forwarding a lin
Similar to voting,
forwarding the content
to others can be
considered a positive
vote for the item by the
user
29. Bookmarking and saving
By bookmarking
URLs, a user is
explicitly
expressing interest
in the material
associated with the
bookmark. URLs
that are commonly
bookmarked
bubble up higher in
the site.
30. Purchasing items
users purchase items
casting an explicit vote
of confidence in the
item
Amazon (Item-to-Item
recommendation engine)
Users that buy similar
items can be correlated
Items that have been
bought by other users can
be recommended to a user
…
31. Click-stream news.google.com personalisation
When a list of items is
presented to a user … …
positive vote item
clicked
Looking at whether an item
was visited and the time
spent on it provides useful
t id f l
information.
You can also gather useful
statistics from this data:
• ■ What is the average time a
user spends on a particular
item?
• ■ For a user, what is the
average time spent on any
given article?
32. Reviews
Opinions and tastes are often expressed through
reviews and recommendations. These have the
greatest impact on other users when
They’re unbiased
The reviews are from similar users
They’re from a person of influence
Just like voting for articles at Digg, other users
can endorse a reviewer or vote on his reviews
33. Converting user interaction into intelligence
User interaction gets converted into a dataset for learning.
three users who’ve rated photographs
a number of ways to transform raw ratings from users into
intelligence
aggregate all the ratings about the item and provide the average
• to create a Top 10 Rated Items list
• constantly promoting the popular content
34. Given the set of data, we answer two questions in our example:
What are the set of related items for a given item?
For a user, who are the other users that are similar to the user?
Three approaches:
• cosine-based similarity,
• correlation-based similarity, and
• adjusted-cosine-based s a ty
adjusted cos e based similarity.
35. Cosine-based similarity computation
takes the dot product of two vectors
to learn about the photos, we transpose the matrix
photos
a row corresponds to a photo while
the columns (users) correspond to
dimensions that describe the photo
normalize the values for each of the rows
by dividing each of the cell entries by the square root of the sum of
the squares of entries in a particular row
The similarity between Photo 1 and Photo 2 is computed as
(0.8018 * 0.7428) + (0.5345 * 0.3714) + (0.2673 * 0.557) = 0.943
36. Item-to-item similarity table
Wh t are the set of related items for a given item?
What th t f l t d it f i it ?
According to the table, Photo1 and Photo2 are very similar.
To determine similar users,
associated with each user is a vector, where the rating associated
with each item corresponds to a dimension in the vector
analysis process is similar to calculating the item-to-item similarity
table
User-to-user similarity table
37. Intelligence from other forms of user
interactions
How other forms of user-interaction get transformed into
metadata?
Approaches
content-based and
collaboration-based
38. content-based approach
metadata is associated with each item
Thi t
This term vector could b created b analyzing th content of
t ld be t d by l i the t t f
the item or using tagging information by users
The term vector consists of keywords or tags with a relative
weight associated with each term.
As the user saves content, visits content, or writes
recommendations, she i h i the metadata associated with each
d i h inherits h d i d ih h
39. Collaboration-based approach
analysis of data collected by bookmarking, saving an item,
recommending an item
a sparsely populated dataset
What are other items that have been bookmarked by other
users who bookmarked the same articles as a specific
user?
When the user is John, the answer is Article 3 — Doe has
bookmarked Article 1 and also Article 3.
What are the related items based on the bookmarking
patterns of the users?
40. Collaboration-based approach
Here useful to invert the dataset:
The users correspond to the dimensions of the vector for an article.
Similarities between two items are measured by computing the dot
product between them
The normalized matrix is
The item-to-item similarity matrix based on this data is
LEARNING: if someone bookmarks Article 1,
you should recommend Article 3 to the user,
user
and if the user bookmarks Article 2, you should
also recommend Article 3
41. A similar analysis can be performed by using
information from the items the user
saves,
purchases, and
recommends.
You can further refine your analysis by
associating
data only from users that are similar to a user based on
user-profile information.
42. Summary
Metadata associated with users and items can be used
to derive intelligence in the form of building
recommendation engines and predictive models for
personalization, and for enhancing search.
43. References
S. Alag, Collective intelligence in action, Manning, 2009
H. Marmanis, D. Babenko , Algorithms of the Intelligent Web,
, g g ,
Manning, 2009
T. Segaran , Programming Collective Intelligence: Building Smart Web
2.0 Applications O’Reilly
2 0 Applications, O Reilly
Wang, Jun, Arjen P. de Vries, and Marcel J.T. Reinders. Unifying User-
based and Item-based Collaborative Filtering Approaches by Similarity
Fusion. 2006
Fusion 2006.
http://ict.ewi.tudelft.nl/pub/jun/sigir06_similarityfuson.pdf
44. Thank you!
y
Rajendra Ak k
R j d Akerkar
Vestlandsforsking, Sogndal, NORWAY
E mail:
E-mail: rak@vestforsk.no
URL: www.tmrfindia.org/ra.html