This paper was presented at the 5th International Conference on Social Informatics (http://www.socinfo2013.com/) in Kyoto, Japan on 27 November 2013.
The full paper can be found at: http://link.springer.com/chapter/10.1007%2F978-3-319-03260-3_25
Scanning the Internet for External Cloud Exposures via SSL Certs
ONTOLOGY-BASED PROFILE RESOLUTION
1. www.insight-centre.org
An Ontology-based Technique for
Online Profile Resolution
Keith Cortis, Simon Scerri, Ismael Rivera,
Siegfried Handschuh
International Conference on Social Informatics
Kyoto, Japan
27th November 2013
2. Introduction (1)
www.insight-centre.org
Instance Matching : if two instances /
representations refer to the same real world
entity or not e.g., persons
Research Challenge : Discovery of multiple
online profiles that refer to the same person
identity on heterogeneous social networks
3. Introduction (2)
www.insight-centre.org
Improved profile matching system extended
with:
Named
Entity Recognition
Linked Open Data
Semantic Matching
Additional Benefit: Ontology used
background schema
Advantage: Standard schema enables
cross-network interoperability
as
a
4. Motivation
www.insight-centre.org
Contact Matcher Applications:
Control sharing of personal data
Detection of fully or partly anonymous
contacts
o
> 83 million fake accounts
New contacts suggestions that are of direct
interest to user
5. Profile Resolution Technique
www.insight-centre.org
1
User Profile
Data Extraction
NCO
2
Semantic Lifting
3
Named Entity Recognition
Name
ANNIE
IE System
Surname
Large KB
Gazetteer
City
4
Hybrid Matching
Process
a
Attribute
Value
Matching
b
c
Semantic-based
Matching Extension
City
Country
Country
country
5
Online Profile Suggestions
6
Online Profile Merging
Attribute Weighting
Function
7. Semantic Lifting
www.insight-centre.org
Lifting semi-/un-structured profile information
from a remote schema
Transform information to instances of the
Contact Ontology (NCO)
NCO - Identity-related online profile information
8. Profile Resolution Technique
www.insight-centre.org
1
User Profile
Data Extraction
NCO
2
Semantic Lifting
3
Named Entity Recognition
Name
ANNIE
IE System
Large KB
Gazetteer
Surname
City
4
Hybrid Matching
Process
a
Attribute
Value
Matching
Country
10. Profile Resolution Technique
www.insight-centre.org
1
User Profile
Data Extraction
NCO
2
Semantic Lifting
3
Named Entity Recognition
Name
ANNIE
IE System
Large KB
Gazetteer
Surname
City
4
Hybrid Matching
Process
a
Attribute
Value
Matching
b
Semantic-based
Matching Extension
City
Country
country
Country
11. Semantic-based Matching
www.insight-centre.org
Indirect semantic relations at a schema level
Use-case: Location-related profile attributes
Location sub-entities being semantically
compared are: city, region and country
Find the semantic relations between the subentities in question in a bi-directional manner
E.g. Galway (profile 1) vs. Ireland (profile 2)
Galway
locatedWithin
Ireland
Ireland
country
isPartOf
isLocationOf
containsLocation
Galway
capital
largestCity
12. Profile Resolution Technique
www.insight-centre.org
1
User Profile
Data Extraction
NCO
2
Semantic Lifting
3
Named Entity Recognition
Name
ANNIE
IE System
Surname
Large KB
Gazetteer
City
4
Hybrid Matching
Process
a
Attribute
Value
Matching
b
c
Semantic-based
Matching Extension
City
Country
country
Country
Attribute Weighting
Function
13. Attribute Weighting Function
www.insight-centre.org
Approach 1: Direct Similarity Score
Name
Justin Bieber
Similarity Value
J. Bieber
0.90
Approach 2: Normalised Similarity Score
based on a threshold for each attribute type
Attribute Threshold for Name : 0.70
Name
Justin Bieber
J. Bieber
Metric Similarity Value
0.90
Similarity Value
1.0
Name
Justin Bieber
Joffrey Baratheon
Metric Similarity Value
0.4
Similarity Value
0.0
14. Profile Resolution Technique
www.insight-centre.org
1
User Profile
Data Extraction
NCO
2
Semantic Lifting
3
Named Entity Recognition
Name
ANNIE
IE System
Surname
Large KB
Gazetteer
City
4
Hybrid Matching
Process
a
Attribute
Value
Matching
b
c
Semantic-based
Matching Extension
City
Country
Country
country
5
Online Profile Suggestions
Attribute Weighting
Function
15. Online Profile Suggestions
www.insight-centre.org
Name
Joffrey Baratheon
Joff Baratheon
City
King’s Landing
King’s Landing
Role
King
King
286AL
286AL
Date of Birth
Similarity Score
0.95
Similarity Threshold: 0.90
Name
Joffrey Baratheon
Joffrey Bieber
City
King’s Landing
London, Ontario
Role
King
Singer
286AL
01/03/1994
Date of Birth
Similarity Score
0.30
17. Profile Resolution Technique
www.insight-centre.org
1
User Profile
Data Extraction
NCO
2
Semantic Lifting
3
Named Entity Recognition
Name
ANNIE
IE System
Surname
Large KB
Gazetteer
City
4
Hybrid Matching
Process
a
Attribute
Value
Matching
b
c
Semantic-based
Matching Extension
City
Country
Country
country
5
Online Profile Suggestions
6
Online Profile Merging
Attribute Weighting
Function
18. Experiments & Evaluation
www.insight-centre.org
Two-staged evaluation:
1. Technique
a) Best attribute similarity score approach
b) If NER & semantic-based matching extension
improve overall technique
c) The computational performance of hybrid
technique against the syntactic-based one
d) A similarity threshold that determines profile
equivalence within a satisfactory degree of
confidence
2. Usability
e) Level of precision for the profile matching
19. Technique Evaluation
www.insight-centre.org
Two Datasets:
1. A controlled dataset of public profiles obtained
from the Web (LinkedIn and Twitter)
182 online profiles
–
–
112 ambiguous real-world
persons (common attributes)
70 refer to 35 well-known
sports journalists
Maximised False Positives
2. Private personal and contact-list profiles
obtained from 5 consenting participants
21. Technique Evaluation – Experiment 2
www.insight-centre.org
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
String
Technique
Precision
Recall
F1-Measure
0.7
0.75
Threshold value
0.8
Result
Result
String-based technique vs. String + NER + Semanticbased technique
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
Hybrid
Technique
Precision
Recall
F1-Measure
0.7
0.75
0.8
Threshold value
New hybrid technique improves the results
considerably over the string-only based one
F-measure -> more or less stable for thresholds of
0.75 and 0.8.
22. Technique Evaluation – Experiment 3
www.insight-centre.org
Computational performance of hybrid technique vs.
syntactic-only based one
For this test we selected profile pairs:
Having a number of common attributes
At least 1 attribute candidate for semantic matching
40
35
Time (ms)
30
25
20
Syntactic
15
Hybrid
10
5
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Number of Common Attributes
On average hybrid technique takes ≈15ms more
24. Usability Evaluation (1)
www.insight-centre.org
Quantitative & Qualitative
Performance of profile matching technique
Contact matcher run against the two social
networks that user is most active
Social Networks chosen:
Number of participants: 16
Person suggestion page
Short survey about their user experience
28. Limitations
www.insight-centre.org
Person’s gender is not provided by all social
network APIs
Identify gender based on first name or
surname through NER
Weights of some profile attributes e.g., first
name, surname are too high
In some cases they impact the final result too
strongly
More experiments will be conducted to finetune these weights
29. Future Work
www.insight-centre.org
Consider identification of higher degrees of
semantic relatedness
country
Enrich technique with other LOD cloud datasets
Additional social networks targeted
30. Conclusion
www.insight-centre.org
Profile matching algorithm with:
Semantic Lifting
NER on semi-/un-structured profile information
Linked Open Data to improve the NER process
Semantic matching at the schema level to find
any possible indirect semantic relations
Weighted Profile Attribute Matching
Quantitative & Qualitative Evaluation
Thank you for your attention
31. Related Work Comparison
www.insight-centre.org
Existing Profile Matching Approaches based on:
User’s friends
Specific Inverse Functional Properties e.g., email
address
String matching of all profile attribute
Semantic relatedness between text, depending
on remote Knowledge Bases e.g., Wikipedia
Evaluation of these Approaches:
Technique Evaluation on controlled datasets
No Usability Evaluation