This presentation is about our paper which was presented at the Hypertext conference 2008.
Abstract: In recent literature, several models were proposed for reproducing and understanding the tagging behavior of users. They all assume that the tagging behavior is influenced by the previous tag assignments of other users. But they are only partially successful in reproducing characteristic properties found in tag streams. We argue that this inadequacy of existing models results from their inability to include user’s background knowledge into their model of tagging behavior. This paper presents a generative tagging model that integrates both components, the background knowledge and the influence of previous tag assignments. Our model successfully reproduces characteristic properties of
tag streams. It even explains effects of the user interface on the tag stream.
Links to the paper:
http://dx.doi.org/10.1145/1379092.1379109
http://www.uni-koblenz.de/~staab/Research/Publications/2008/DellschaftStaabHypertext08.pdf
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
An Epistemic Dynamic Model for Tagging Systems
1. <is web> Information Systems & Semantic Web
University of Koblenz ▪ Landau, Germany
An Epistemic Dynamic Model for Tagging Systems
Klaas Dellschaft and Steffen Staab
{klaasd, staab}@uni-koblenz.de
2. Delicious Tagging Interface <is web>
How do users influence each other in tagging systems?
Hypothesis: Tagging involves imitation of other users AND
selection of tags from background knowledge of users.
ISWeb - Information Systems Klaas Dellschaft Hypertext 2008
& Semantic Web klaasd@uni-koblenz.de 2 of 24
3. Understanding Dynamics in Tagging Systems <is web>
User interface Something else?
Tagging
Conceptualization Behavior
Comparison
Shared
of Statistics
Own Background
Knowledge terminology
Model of User Interface Influence
Simulated
Joint Stochastic Model Tagging
Behavior
Model of Own Model of Sharing
Background Knowledge
ISWeb - Information Systems Klaas Dellschaft Hypertext 2008
& Semantic Web klaasd@uni-koblenz.de 3 of 24
4. Folksonomies <is web>
Vertexes: Users, tags, resources
Hyperedges: Tag assignments (user X tag X resource)
Postings:
Tag assignments of a user to a single resource
Can be ordered according to their time-stamp
ISWeb - Information Systems Klaas Dellschaft Hypertext 2008
& Semantic Web klaasd@uni-koblenz.de 4 of 24
5. Co-occurrence Streams <is web>
Co-occurrence Streams:
All tags co-occurring with a given tag in a posting
Ordered by posting time
Example tag assignments for ‘ajax':
{mackz, r1, {ajax, javascript}, 13:25}
{klaasd, r2, {ajax, rss, web2.0}, 13:26}
{mackz, r2, {ajax, php, javascript}, 13:27}
Resulting co-occurrence stream:
javascript rss web2.0 php javascript
time
Tag |Y| |U| |T| |R|
ajax 2.949.614 88.526 41.898 71.525
blog 6.098.471 158.578 186.043 557.017
xml 974.866 44.326 31.998 61.843
ISWeb - Information Systems Klaas Dellschaft Hypertext 2008
& Semantic Web klaasd@uni-koblenz.de 5 of 24
6. Co-occurrence Streams – Tag Growth <is web>
Linear scaling of axes
Logarithmic scaling of axes
ISWeb - Information Systems Klaas Dellschaft Hypertext 2008
& Semantic Web klaasd@uni-koblenz.de 6 of 24
7. Co-occurrence Streams – Tag Frequencies <is web>
Logarithmic scaling of axes
ISWeb - Information Systems Klaas Dellschaft Hypertext 2008
& Semantic Web klaasd@uni-koblenz.de 7 of 24
8. Resource Streams <is web>
Resource Streams:
All tags assigned to a resource
Ordered by posting time
Example tag assignments:
{mackz, r1, {ajax, javascript}, 13:25}
{klaasd, r2, {ajax, rss, web2.0}, 13:26}
{mackz, r2, {ajax, php, javascript}, 13:27}
Resulting resource stream:
ajax rss web2.0 ajax php javascript
time
ISWeb - Information Systems Klaas Dellschaft Hypertext 2008
& Semantic Web klaasd@uni-koblenz.de 8 of 24
9. Resource Streams – Tag Frequencies <is web>
Logarithmic scaling of axes
ISWeb - Information Systems Klaas Dellschaft Hypertext 2008
& Semantic Web klaasd@uni-koblenz.de 9 of 24
10. Resource Streams – Tag Frequencies <is web>
Logarithmic scaling of axes
Taken from Halpin, Robu and Shepard, WWW 2007
ISWeb - Information Systems Klaas Dellschaft Hypertext 2008
& Semantic Web klaasd@uni-koblenz.de 10 of 24
11. Existing Models <is web>
Frequency Rank
Co-occur. Streams Resource Streams Tag Growth
Polya Urn Model O O Fixed size
Simon Model O O Linear
YS Model w/ Memory + O Linear
Halpin et al. Model O O Linear
Observed Properties Long tail w/ cut-off Drop around tag 7 Sublinear
ISWeb - Information Systems Klaas Dellschaft Hypertext 2008
& Semantic Web klaasd@uni-koblenz.de 11 of 24
12. Stochastic Models of Influence <is web>
User interface Something else?
Tagging
Conceptualization Behavior
Comparison
Shared
of Statistics
Own Background
Knowledge terminology
Model of User Interface Influence
Simulated
Joint Stochastic Model Tagging
Behavior
Model of Own Model of Sharing
Background Knowledge
ISWeb - Information Systems Klaas Dellschaft Hypertext 2008
& Semantic Web klaasd@uni-koblenz.de 12 of 24
13. Modeling Tag Imitation <is web>
Delicious Tagging Interface
Imitation of previous tag assignments
Free input of new tags (taken from background knowledge of a user)
ISWeb - Information Systems Klaas Dellschaft Hypertext 2008
& Semantic Web klaasd@uni-koblenz.de 13 of 24
14. Simulating a Tag Stream <is web>
Start with empty tag stream
Each simulation step appends a new tag assignment:
p(w|t): Probability of selecting word w for topic t.
Modeled by word distributions in a topic centered
text corpus.
K
PB
1-
P
BK
n: Number of visible previous tags.
h: Maximal number of previous tag assignments
used for determining ranking of the n distinct tags.
ISWeb - Information Systems Klaas Dellschaft Hypertext 2008
& Semantic Web klaasd@uni-koblenz.de 14 of 24
15. Modeling Background Knowledge <is web>
Text Corpora Del.icio.us
Text Corpora
PBK: Probability of selecting from background knowledge
p(w|t): Probability of selecting word w for topic t. Modeled by
word distributions in a topic centered text corpus.
p(w|r): Probability of selecting word w for resource r.
ISWeb - Information Systems Klaas Dellschaft Hypertext 2008
& Semantic Web klaasd@uni-koblenz.de 15 of 24
16. Modeling Tag Imitation <is web>
PBK
t t-1 t-2 t-3 t-4 t-5 … t-h …
1-PBK
1 2 3 … n
PI = 1 - PBK: Probability of imitating a previous tag assignment
n: Number of visible top-ranked tags
h: Maximal number of previous tag assignments used for determining
ranking of the n distinct tags
ISWeb - Information Systems Klaas Dellschaft Hypertext 2008
& Semantic Web klaasd@uni-koblenz.de 16 of 24
17. Simulation Results <is web>
User interface Something else?
Tagging
Conceptualization Behavior
Comparison
Shared
of Statistics
Own Background
Knowledge terminology
Model of User Interface Influence
Simulated
Joint Stochastic Model Tagging
Behavior
Model of Own Model of Sharing
Background Knowledge
ISWeb - Information Systems Klaas Dellschaft Hypertext 2008
& Semantic Web klaasd@uni-koblenz.de 17 of 24
18. Simulating Co-occurrence Streams <is web>
Tag growth:
Influenced by PBK and p(w|t)
Tag Frequencies:
Influenced by PBK, p(w|t), n, h
n: Semantic breadth of a topic (blog: 100 tags,
ajax: 50 tags, xml: 50 tags; Cattuto et al. 2007)
h: No hint for realistic values. Good guess may be a value
around 1000.
ISWeb - Information Systems Klaas Dellschaft Hypertext 2008
& Semantic Web klaasd@uni-koblenz.de 18 of 24
19. Co-occ. Streams – Simulated Tag Growth <is web>
PI: Imitation PI PBK
PBK: Background Knowl.
Logarithmic scaling of axes
ISWeb - Information Systems Klaas Dellschaft Hypertext 2008
& Semantic Web klaasd@uni-koblenz.de 19 of 24
20. Co-occ. Streams – Simulated Tag Frequencies <is web>
PI: Imitation
PI PBK
PBK: Background Knowl. PI PBK
PI PBK
n: Visible Tags
h: Available History
Logarithmic scaling of axes
ISWeb - Information Systems Klaas Dellschaft Hypertext 2008
& Semantic Web klaasd@uni-koblenz.de 20 of 24
21. Simulating Ajax, Blog and XML <is web>
PI: Imitation (PI
(PI
PBK: Background Knowl.
(PI
n: Visible Tags
h: Available History
blog
ajax
Observation
xml
Simulation
Logarithmic scaling of axes
ISWeb - Information Systems Klaas Dellschaft Hypertext 2008
& Semantic Web klaasd@uni-koblenz.de 21 of 24
22. Res. Streams – Simulated Tag Frequencies <is web>
PI: Imitation
PI PBK
PI PBK
PBK: Background Knowl.
n: Visible Tags
h: Available History
Logarithmic scaling of axes
ISWeb - Information Systems Klaas Dellschaft Hypertext 2008
& Semantic Web klaasd@uni-koblenz.de 22 of 24
23. Conclusions <is web>
• Imitation AND background knowledge are needed for
explaining properties of tag streams
• Probability of imitating previous tag assignments:
~60-90%
Frequency Rank
Co-occur. Streams Resource Streams Tag Growth
Polya Urn Model O O Fixed size
Simon Model O O Linear
YS Model w/ Memory + O Linear
Halpin et al. Model O O Linear
Epistemic Model + + Sublinear
Observed Properties Long tail w/ cut-off Drop around tag 7 Sublinear
ISWeb - Information Systems Klaas Dellschaft Hypertext 2008
& Semantic Web klaasd@uni-koblenz.de 23 of 24
24. <is web>
Data sets and simulation software:
http://isweb.uni-koblenz.de/Research/Tagdataset
http://www.tagora-project.eu
ISWeb - Information Systems Klaas Dellschaft Hypertext 2008
& Semantic Web klaasd@uni-koblenz.de 24 of 24