1. WP
3
User
profiling
and
Recommenda5on
(Part
1)
BBC,
Pro-‐ne+cs,
VUA
1
Wednesday, March 28, 12
2. Contents
Overview
User profiling
General goal & approach
From activity streams to profile
Issues
Analytics
Beancounter
Recommendations
General goal & approach
Semantic recommendation
Statistical recommendation
Hybrid recommendation
Exploitation
Conclusions
26-27 March 2012 NoTube 3rd Review 2
Wednesday, March 28, 12
3. Overview
Semantic Content Semantic
Patterns for Pattern-based
TV Programs Recommendation
EPG Metadata TV Program
Strategy
(BBC) Enrichment
RDF Graph Statistical
TV Recommendation Similarity-based
Programs Service Recommendation
Strategy
User Ratings &
Demographics User Data Similarity
(BBC EPG Analysis Clusters Hybrid
Data) of Programs Recommendation
Strategy
End End-Users
Users
26-27 March 2012 NoTube 3rd Review 3
Wednesday, March 28, 12
4. Overview
Semantic Content Semantic
Patterns for Pattern-based
TV Programs Recommendation
EPG Metadata TV Program
Strategy
(BBC) Enrichment
RDF Graph Statistical
TV Recommendation Similarity-based
Programs Service Recommendation
Strategy
User Ratings &
Demographics User Data Similarity
(BBC EPG Analysis Clusters Hybrid
Data) of Programs Recommendation
Strategy
BEA
NCO
UNT
E R
End End-Users
Users
26-27 March 2012 NoTube 3rd Review 3
Wednesday, March 28, 12
5. User profiling approach
users’ interests and behaviours could be inferred from
their activities on the Social Web
• from tweets,
• liked facebook resources,
• song listened
• ...
interests in topics are represented using Linked Data web
identifiers
• to access a wealth of open and machine-readable data
• to publish profiles in compliance with the LOD paradigm
• to leverage on the graph-based model of such data sets
26-27 March 2012 NoTube 3rd Review 4
Wednesday, March 28, 12
6. User profiling: Challenge
main challenge: extracting meaningful data from
different sources of user activities
to produce LOD identifiers from activities:
• “follow-your-nose”, record-linkage based approach
• semantic-annotation-based approach, NLP techniques on raw text
interests are weighted to represent their descriptiveness
user profiles are syndicated using JSON, JSON-P and RDF
26-27 March 2012 NoTube 3rd Review 5
Wednesday, March 28, 12
7. User profiling: Follow-your-nose
“follow-your-nose”, record-linkage based
record linkage is “the problem of recognising those records in
two files which represent identical persons, objects or events
(said to be matched).”
we adopted a text retrieval version, incremental constrained
multiple text searches
facebook.com/pages/Shoeshine/ dbpedia.org/resource/
26-27 March 2012 NoTube 3rd Review 6
Wednesday, March 28, 12
8. User profiling: Semantic
Annotation
for some activities the “follow-your-noise” approach is not
suitable
Tweet, or text resources need Natural Language Processing
techniques
• semantic annotation using LUpedia (WP4)
lookup for LOD identifiers from:
• tweet text
• #hashtags definitions
• linked Web pages
26-27 March 2012 NoTube 3rd Review 7
Wednesday, March 28, 12
10. User profiling: Semantic
Annotation
Bubbles Devere is the best thing ever.
#littlebritain
26-27 March 2012 NoTube 3rd Review 8
Wednesday, March 28, 12
11. User profiling: Semantic
Annotation
Bubbles Devere is the best thing ever.
#littlebritain
Brilliant british humor by Matt Lucas & David
Walliams - whole range of facinating characters
portraying diversity of british society
26-27 March 2012 NoTube 3rd Review 8
Wednesday, March 28, 12
12. User profiling: Semantic
Annotation
Bubbles Devere is the best thing ever.
#littlebritain
Brilliant british humor by Matt Lucas & David
Walliams - whole range of facinating characters
portraying diversity of british society
WP4
Enrichment
http://dbpedia.org/resource/Matt_Lucas
http://dbpedia.org/resource/David_Walliams
26-27 March 2012 NoTube 3rd Review 8
Wednesday, March 28, 12
13. User profiling: Issues
non-deterministic record-linkage and semantic annotation
could introduce noise
• noisy data leads to misleading profiles
• recommendations could be affected
hence, we introduced interest weights
• to minimise the effect of potential noise eliminating poorly descriptive
interests giving them lower weights
• to represent the evolution of a single interest
recurring interest over time gain more weights
26-27 March 2012 NoTube 3rd Review 9
Wednesday, March 28, 12
14. Analytics
“people are usually interested in information about themselves”
from Doppler annual report
26-27 March 2012 NoTube 3rd Review 10
Wednesday, March 28, 12
15. NoTube Beancounter
The User profiling and analytics components has been
lovingly called “Beancounter” since the early days
built on top of experience and experiments made during
the 3 years of the project
a scalable, activity-streams-oriented set of processes
• filtering, slicing, fast key lookups
• many analysis are really just “counting the beans”
• analysis deserves an high performance architecture
26-27 March 2012 NoTube 3rd Review 11
Wednesday, March 28, 12
16. NoTube Beancounter
key value
analysis {
crawler
activities
{
{
analysis profiler
profiles engine
REST platform
26-27 March 2012 NoTube 3rd Review 12
Wednesday, March 28, 12
17. Acknowledgements
26-27 March 2012 NoTube 3rd Review 13
Wednesday, March 28, 12