Semantic Technologies to Support the User-Centric Analysis of Activity Data
1. Semantic Technologies to Support the User-Centric
Analysis of Activity Data
Mathieu d’Aquin, Salman Elahi, Enrico Motta
Knowledge Media Institute, The Open University
7. Challenges in user centric
activity data
• Activity data that sit in
logs are
– Heterogeneous –
different models for
different sites/systems
– Raw – uninterpreted
– Horribly big –
thousands of pieces of
information generated
every minute
– Hard to exploit,
understand, analyze
8. User Centric Activity Data
Activity analysis Consolidation
for and by Integration Ontologies
individual users Interpretation
Logs Logs
Logs 2 4
1 Logs
3
Website 2 Website 4
Website 1
Website 3
Organisation
Users
9. User support
PREFIX tr:<http://uciad.info/ontology/trace/>
PREFIX actor:<http://uciad.info/ontology/actor/>
User Logging Detect setting
construct {
or register (agent+IP)
?trace ?p ?x.
?x ?p2 ?x2.
User name: mathieu ?x2 ?p3 ?x3.
?x3 ?p4 ?x4 unknown setting
non-ambiguous
Password: ****** } where{
It is the first time you log into <http://uciad.info/actor/mathieu> actor:knownSetting
?set. Check setting
UCIAD with this setting (detail)
non-
?trace tr:hasSetting ?set.
do you want to attach it to your Your current?trace ?p ?x.
setting is:
account? ambiguous
?x ?p2 ?x2.
Computer IP: 137.108.2x.1xx
ambiguous
?x2 ?p3 ?x3.
User Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US)
?x3 ?p4 ?x4
known setting for user
AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.A.B.C Safari/525.13
}
This setting is not currently attached to a user, so it will be added to your
yes Register
known settings as you log into the system Display
Activity Data
Add setting to
setting as related to all known
known setting
ambiguous settings of the user
no
10. <rdf:RDF>
<rdf:Description rdf:about="http://uciad.info/trace/kmi-
web13/ede2ab38da27695eec1e0b375f9b20da">
User support
<rdf:type rdf:resource="http://uciad.info/ontology/trace/Trace"/>
for graph http://uciad.info/users/mathieu
<hasAction rdf:resource="http://uciad.info/action/GET"/>
Export
my data
<hasPageInvolved
rdf:resource="http://uciad.info/page/0b9abc62fcf90afc53797b938af435dd"/>
<hasResponse
rdf:resource="http://uciad.info/response/ea95add1414aba134ff9e0482b921a33"/>
<hasSetting
User Logging Detect setting
rdf:resource="http://uciad.info/actorsetting/119696ec92c5acec29397dc7ef98817f"/>
or register
<hasTime (agent+IP)
rdf:datatype="http://www.w3.org/2001/XMLSchema#string">13/Jun/2011:01:37:23+0100</hasTi
me>
</rdf:Description>
</rdf:RDF> unknown setting
<rdf:Description rdf:about="http://uciad.info/page/0b9abc62fcf90afc53797b938af435dd">
non-ambiguous
<rdf:type rdf:resource="http://uciad.info/ontology/sitemap/WebPage"/>
<isPartOf rdf:resource="http://uciad.info/ontology/test1/dataopenacuk"/>
It is the first time you log into
<onServer rdf:resource="http://kmi-web13.open.ac.uk"/> Check setting
UCIAD with this setting (detail)
<url rdf:datatype="http://www.w3.org/2001/XMLSchema#string">
non-
do you want to attach it to your
/resource/person/ext-718a372e10788bb58d562a8bf6fb864e
account?
</url> ambiguous
</rdf:Description>
ambiguous
<rdf:Description rdf:about="http://uciad.info/ontology/test1/dataopenacuk">
<rdf:type rdf:resource="http://uciad.info/ontology/sitemap/Website"/>
known setting for user
<rdf:type rdf:resource="http://uciad.info/ontology/test1/LinkedDataPlatform"/>
<onServer rdf:resource="http://kmi-web13.open.ac.uk"/>
<urlPattern rdf:datatype="http://www.w3.org/2001/XMLSchema#string">/*</urlPattern>
</rdf:Description>
yes Register Display Activity Data
<rdf:Description rdf:about="http://uciad.info/response/ea95add1414aba134ff9e0482b921a33">
Add setting to
setting as related to all known
<rdf:type rdf:resource="http://uciad.info/ontology/trace/HTTPResponse"/>
known setting
<hasResponseCode rdf:resource="http://uciad.info/ontology/trace/200"/>
<hasSizeInBytes ambiguous settings of the user
rdf:datatype="http://www.w3.org/2001/XMLSchema#int">1085</hasSizeInBytes>
</rdf:Description>
no
12. Ontologies
Formal conceptual models of
a domain: online user
activity
Key Concepts:
– Actor: the things accessing
resources (through agents)
– Resources: Webpages,
Websites
– Actions: realized by actors
on resources, e.g., requests
– Events: an actor realizing an
action on a resource
14. User support
User Logging Detect setting
or register (agent+IP)
unknown setting
non-ambiguous
It is the first time you log into Check setting
UCIAD with this setting (detail)
non-
do you want to attach it to your
account? ambiguous
ambiguous
known setting for user
yes Register Display Activity Data
Add setting to
setting as related to all known
known setting
ambiguous settings of the user
no
15. Authenticated SPARQL
Query:
Protected SPARQL
Select ?x Access right info: endpoint
where {?x a uciad:Website} User->graphs
Credentials:
User=mathieu
Pass=mypass matgraph
mathieu?
onto
HTTP + basic auth
Query:
SPARQL endpoint
Select ?x
Standard interface with
From matgraph,onto
SPARQL authentication
where {?x a
results uciad:Website}
16. Customizing the Ontologies =
Customizing the Analysis
The User Activity
Ontologies for the basis Base Activity
to describe generic Ontologies
activity data in a sharable
way
Customized extensions:
– Specific categories of User Activity
resources, actions and Data
events
– Formally defined to Inference
allow inference
Create customized
aggregations,
classifications and Specific
distributions in the data Classifications,
Distributions,
that allow for specific Aggregations…
analyses
17. Examples
In the ontology:
1. vhs-wiki is a Wiki
2. Data.open.ac.uk is a
DataPlatform S
3. Actions on a Page which is part- u
of a Wiki are called usingWiki b
4. Similarly for usingDataPlatform -
c
l
And… a
1. Activities usingAWiki with a user- s Pages involved in
agent which is an RSS-Reader s usingWikiThoughtBro
are e wser
checkingWikiUpdatesWithRSS s
2. Otherwise, they are
usingWikiThroughBrowser o
f
u
18. Examples
Sub-classes of
usingDataPlatform
In the Ontology
1. The page
http://data.open.ac.uk/query is
a SPARQLEndpoint Settings used in
executingASparqlQuery
2. An action on a The most used is curl on
SPARQLEndpoint with a query the user’s laptop
parameter is
ExecutingASparqlQuery
3. Pages of the form
http://data.open.ac.uk/page/*
are DataPages
4. An action on a DataPage with a
BrowserAgent is
ConsultingADataPage Sub-classes of DataPages
consulted by the user
19. Browsing Interface: LDI
Class
Sub-
classes
with
distribution
of
instances
Properties Details of a
with member
distribution (instance)
of Values
List of
members
(instances)
20. Conclusion
• The idea of the UCIAD project was to investigate
and experiment with the use of semantic
technologies for the user centric integration of
activity data
• Demonstrated the value of the approach, as well
as current technical limitations:
– Scalability
– Flexible Access-control
– Usability
21. Future Work/Next Steps
• User studies: what can people do with their
activity data? In which form?
• Scenarios for user centric activity data
– Project Danube, Higgins, Mydex, personal.com, …
with semantics?
• Licensing User Data?
22. Personal Monitoring of Web Information Exchange:
Towards Web Lifelogging
Mathieu d’Aquin, Salman Elahi and Enrico Motta – m.daquin@open.ac.uk
Future Work/Next Steps
With more and more services relying on the Web to communicate
with their users, the amount of information exchanged daily by an
individual through various Web channels has become difficult to
control. While in principle this gives better possibilities to share
and exchange information with various people and organizations,
it also makes it more difficult for Web users to fully comprehend,
explore and exploit exchanges of their own data.
We developed a Web lifelogger, dedicated to tracking every ex-
changes realized over the Web by an individual Web user, and to
store these logs using semantic technologies. We ran an experi-
ment on using such a tool for a period of 2.5 months for a particular
user. The collected data (100M Triples) can be used by the user
to monitor and study his own online behavior based in particular
on basic analytics, models of the perceived trust relationship this
Our previous work on using
user has with different websites and on what can be learnt from
analyzing the use of Web search engines.
local proxy to collect
Basic Analytics Trust in Domains and Criticality of Data information on user
generated Web traffic…
… and linking this
Number of requests per hour of the day (Sum). Allows
to identify events appearing on a typical day. information to web
resources…
Map of the locations of the servers where requests have
been sent. Allows to identify the physical space of Web
A simple iterative model is defined to compute the perceived trust in websites (top), and the per-
ceived criticality of personal data (bottom) based on observing the exchange of this data. The
… to create online personal
information/personal
interactions. simple intuition on which we rely is that a trusted website receives critical data, and that critical data
is shared only with a few trusted websites. Exposing this model to the user in an interactive way can
help aligning the perceived behavior with the intended one, and detect possible conflicts between
data exchange and personal privacy rules.
Analyzing Search History analytics interfaces..
Cloud of the most commonly access websites. Shows
the impact of ‘implicit’ requests.
…
Web search history is known to provide interesting indications of the user’s interests. Using Open-
49 different tools accessing the Web (User-Agent) can Calais SemanticProxy (ht t p: / / ht t p: / / sem i cpr oxy. opencal ai s. com ), we detect general themes
ant /
be identified, including Web browsers, twitter clients, e- from the analysis of search keywords, directly pointing to additional resources. Also, we see pat-
mail clients, update utilities, social applications, etc. terns emerging from the use of search engines, in terms of navigational and informational searches.
23. Resources
Sites Time Entities
Friday 14th October 2011 (number of requests) People
Peter Scott Kurt Cobain Adele Ashley MacIsaac
Steve Jobs Bach Vincent Cassel Enrico Motta
Virginia Woolf Terry Pratchett Jane Austen William
Gibson Neil Gaiman Martin Bean Nicolas
Sarkozy Fouad Zablith David Cameron
Marta Sabou Michael Jackson Jimi Hendrix Tim
Berners Lee Stuart Brown Carlo Allocca
Profile
Scott Adams
Organizations
British Broadcasting Coorporation The Gardian
The Open University Joint Information
Systems Committee Engineering and Physical Science
Resource Council Google Amazon La compagnie des
branques Facebook Arts and Humanities Research Council
Knowledge Media Institute Wikimedia Foundation
By Hour By Week By Month Agence National de la Recherche Apple European
Commission
Locations
Places
United Kingdom Euston Walton Hall France
Paris Luxembourg Heathrow Metz Nancy
Birmingham Coulsdon New York London Washington
Manchester Dublin Bonn Dusseldorf Rome Thionville
Chamonix Milton Keynes Mont Blanc England
Alderaan Nice Gare de l'Est Croydon Saint Pancras
Bletchley Luton
Graph View
Other Keywords
Languages
Education Semantics iPad Summer School
Semantic Web Cajon Case-Based Reasoning
English 68% Artificial Intelligence Dataset PHP Data Mining School
University Educational Resources OpenLearn
French 24% SocialLearn Ontologies OWL Editor Journal
5
Conference Linked Data Teaching Music
German Workshop iPhone Java Javascript Discovery RDF
% Guitar Pirates
Italian 2
% Filters
24. Resources
Sites Time Entities
Friday 14th October 2011 (number of requests) People
Peter Scott Kurt Cobain Adele Ashley MacIsaac
Steve Jobs Bach Vincent Cassel Enrico Motta
Virginia Woolf Terry Pratchett Jane Austen William
Gibson Neil Gaiman Martin Bean Nicolas Motta
Enrico
Sarkozy Fouad Zablith David Cameron the
Professor at
Marta Sabou Michael Jackson JimiKnowledge media
Hendrix Tim
Berners Lee Stuart Brown Carlo Allocca
Institute
Profile
Scott Adams
Relation to you:
Colleague, Friend, Line
Organizations Manager
British Broadcasting Coorporation The Gardian
The Open University Joint Information
Systems Committee Engineering and Physical Science
Resource Council Google Amazon La compagnie des
branques Facebook Arts and Humanities Research Council
Knowledge Media Institute Wikimedia Foundation
By Hour By Week By Month Agence National de la Recherche Apple European
Commission
Locations
Places
United Kingdom Euston Walton Hall France
Paris Luxembourg Heathrow Metz Nancy
Birmingham Coulsdon New York London Washington
Manchester Dublin Bonn Dusseldorf Rome Thionville
Chamonix Milton Keynes Mont Blanc England
Alderaan Nice Gare de l'Est Croydon Saint Pancras
Bletchley Luton
Graph View
Other Keywords
Languages
Education Semantics iPad Summer School
Semantic Web Cajon Case-Based Reasoning
English 86% Artificial Intelligence Dataset PHP Data Mining School
University Educational Resources OpenLearn
14
Italian SocialLearn Ontologies OWL Editor Journal
%
5
Conference Linked Data Teaching Music
German Workshop iPhone Java Javascript Discovery RDF
% Guitar Pirates
French
Filters
25. More info
UCIAD Blog: http://uciad.info
Code base: http://github.com/uciad
Twitter: #uciad
@mdaquin