1. Steffen Staab Web Science 1Institute for Web Science and Technologies · University of Koblenz-Landau, Germany
Web and Internet Science Group · ECS · University of Southampton, UK &
Introduction to Web Science
Steffen Staab
University of Southampton &
Universität Koblenz-Landau
3. Steffen Staab Web Science 3
9.00 am 10.30 am 3.00 pm
Thu Introduction to Web
Science
Noshir Contractor Tutorial Poster session
Fri Internet and Law Nikolaus Forgo Tutorial Project work
Sat Web and Politics Stéphane Bazan Tutorial Project work
Mon Social Machines Jim Hendler Tutorial Project work
Tue Web Entrepreneurship Simon Köhl Tutorial Project work
Wed Computational Social
Science
Nuria Oliver Tutorial Project presentation
Program Overview
Details at http://wwsss16.webscience.org/schedule
4. Steffen Staab Web Science 4
• Work independently and self-organised
• Get feedback from your advisor
• Choose your working space:
– D 238, D 239, A 308, B 005 on weekdays
– E 412, E 413, E 414, A 308, B 005 on Saturday
– or anywhere you like
• Final presentation on Wednesday afternoon
(10 minutes per team)
Project work
Details at http://wwsss16.webscience.org/schedule
5. Steffen Staab Web Science 5
Social Events
Thu Cocktails at city beach 6.00 pm Meet at main campus entry
Fri SommerUni party 8.00 pm Campus
Sat Free time Old town, KaleidosKOp festival,
fortress, beer gardens, ...
Sun Excursion (few seats available!) 10.45 am Meet at main campus entry
Quarter final France vs. Island 9.00 pm Campus
Mon Free time Old town, fortress, beer
gardens, ...
Tue Barbecue 7 pm Campus
SommerUni party 9 pm Campus
Wed SommerUni barbecue 6 pm Campus
6. Steffen Staab Web Science 6
Thursday
Noshir Contractor:
Leveraging Web/Internet/Network
Sciences (WINS) to address
Grand Societal Challenges
9.00 am
10.30 am
Steffen Staab and Jérôme Kunegis:
Introduction to Web Science
3.00 pm
Poster session
6.00 pm
Cocktails at city beach
meeting point: main campus entrance
Breaks:
• 10 am,
• 12 am,
• 2.30 pm
7. Steffen Staab Web Science 7
Friday
Nikolaus Forgó:
Privacy Law and the Web: A Story
of Love and Hate?
9.00 am
10.30 am
Rüdiger Grimm:
Internet and Law
3.00 pm
Supervised project work
8.00 pm
SommerUni party
At the campus – impossible to miss!
Breaks:
• 10 am,
• 12 am,
• 2.30 pm
10. Steffen Staab Web Science 10
Produce
Consume
Cognition
Emotion
Behavior
Socialisation
Knowledge
Observable
Micro-
interactions
in the Web
Apps
Protocols
Data & Information
Governance
WWW
Observable
Macro-
effects in the
Web
Web Science
11. Steffen Staab Web Science 11
Tertium Datur or Where Dijkstra was wrong
Computer
Science
Science
about
Computers
Astronomy
Telescope
Science
Web
Science
Science
about the
Web
What is Web Science?
12. Steffen Staab Web Science 12
Web science is an emerging interdisciplinary field
concerned with the study of large-scale
socio-technical systems, such as the World Wide
Web. It considers the relationship between people and
technology, the ways that
society and technology co-constitute one another
and the impact of this co-constitution
on broader society.
Wikipedia, 2016-06-29
Definition of Web Science
13. Steffen Staab Web Science 13
Agenda
• What is Web Science?
• What is the Web?
– Aspects of the Web at Large
• How to investigate the Web?
– Observing the Web
– An example using the architecture: bias in the Web
– How to model aspects of the Web
• What is the past and the future of the Web?
14. Steffen Staab Web Science 14
I try to
• classify, describe commonalities, find generalizations
I will not
• be 100% correct, 100% complete
BUT
• Please ask, suggest, complement,...
WHENEVER you feel like it!
DON‘T HESITATE ASKING!
A Note
16. Steffen Staab Web Science 16
What is the Web: Aspects of the Web at Large
Architecture methodology:
• Draw pictures in different dimensions
• Pictures do not compete with, but complement each
other
Not only technical pictures!
17. Steffen Staab Web Science 17
The Web as a Device
Software
• Browsers
– IE, Firefox, Chrome,...
• Web Servers
– Apache, Tomcat,..
• Content Management
and Data Delivery
– Wordpress, drupal,
databases...
• Search Engines
• ...
Standards
• Uniform Resource
Locator (URL)
• HyperText Transfer
Protocol (http)
• HyperText Markup
Language (html)
• Domain name service
• ...+ many more
18. Steffen Staab Web Science 18
The Web as Content
For human consumpton
(primarily)
Text, Hypertext
Images
Video
Audio
Multimedia
Interaction (Games...)
Braille
Mathematics
For machine consumption
(primarily)
Metadata
Data
Ontologies
19. Steffen Staab Web Science 19
The Web as Content
Free
• Informational
• Advertisements
• Goods & services
• Web 2.0
– Wikipedia
– facebook
Paid
• Individual payments
• Subscription
• Micropayment
Freemium
20. Steffen Staab Web Science 20
The Web and its Stakeholders
People
• Citizens
• Customers
• Leisure seekers
• Workers
• Software developers
Internet providers
• Landline
• Mobile
• Nested providers
(internet cafe...)
• Website hosts
• Peer2peer networks (Bittorrent...)
Platform operators
• Shops
• News
• Web 2.0
• Payment
• Advertisement networks
• Trust centers
Government
• Police
• Military
• Secret service
• Law
• Citizen services
• Administration
• Politics
21. Steffen Staab Web Science 21
The Web as a Process
Governing
• Standards processes
– W3C
– Internet Engineering
Task Force (IETF)
– RFC
• Domain name
registration
• Internet routing
– E.g. „great Chinese
firewall“
Regulation
• Legal
– copyright
• where enforced
– hate speech
– ...
• Private
– E.g. Facebook,
Instagram ... Pictures,
nipple double standard
22. Steffen Staab Web Science 22
Observing the Web as
a Medium and Mirror of (anti-)social Practices
• (Self-)Expression
• Dark Web
– Crime
– Gold farming
– Violence
– Pornography
– Identity theft
• Sex lifes
– Fetishes
– prostitution
• Relationships
– Breakups
– Mobbing
– Stalking
– Advising
– Counseling
– Democracy
23. Steffen Staab Web Science 23
The Web as Medium & Mirror of
NEW (?) (anti-)Social Practices
Automation
• Twitter bots
• Online dating bots
• Smart contracts
(„Code is law“)
– E.g. DRM
• Quantified everything
– Physique
– Psyche
• Infer your Big 5
personality traits from
your facebook profile
• Threats to Internet
Security & Safety
– Hacking
– Social engineering
24. Steffen Staab Web Science 24
Produce
Consume
Cognition
Emotion
Behavior
Socialisation
Knowledge
Observable
Micro-
interactions
in the Web
Apps
Protocols
Data & Information
Governance
WWW
Observable
Macro-
effects in the
Web
Web Science: Discipline or Transdisciplinary Endeavour?
28. Steffen Staab Web Science 28
Challenges – Data Collection Issues
Legal and/or Ethical
• Crawling
– May be disallowed by provider
• Usage logging
– Privacy of individuals
• Even if it is allowed....
29. Steffen Staab Web Science 29
Challenges – Data Collection Issues
• Crawling
– What does it mean to crawl a heavily interactive site?
– Incomplete data
• Unreachability
• Time outs
30. Steffen Staab Web Science 30
Challenges – Data Collection Issues
• Crawling
– What does it mean to crawl a heavily interactive site?
– Incomplete data
– Where to start?
• We cannot observe everything!
– Even just for data size!
– What appear to be most fruitful starting points?
31. Steffen Staab Web Science 31
Challenges – Data Collection Issues
• Crawling
– What does it mean to crawl a heavily interactive site?
– Incomplete data
– Where to start?
– Where to stop?
• Each crawl is a view
– Twitter
» Tweet
• URL
• Web Page
• Subweb
» Followers
• Followers‘ Followers
• ...
32. Steffen Staab Web Science 32
Challenges – Data Collection Issues
• Crawling
– What does it mean to crawl a heavily interactive site?
– Incomplete data
– Where to start?
– Where to stop?
– Synchronous vs asynchronous
• Strictly speaking: only asynchronous crawling possible
– But in [Dellschaft&Staab] we targeted the construction of models for
streams of tags
33. Steffen Staab Web Science 33
Challenges – Data Publishing Issues
Legal and/or Ethical Example Issues
• AOL query log
• Netflix challenge
• Delicious
– http://www.tagora-project.eu/data/
• Twitter
– Collecting, but no sharing
• SocialSensor project
34. Steffen Staab Web Science 34
Challenges – Data Publishing Issues
Technical/Modelling issues
• Generic format, e.g. RDF
• Format ready for digestion by a certain software, e.g.
for Matlab processing
• Openness to other data
– E.g. references to DBPedia/Wikipedia
• Accuracy of publishing
– http://me.org showed „...“
– http://me.org showed „...“@2013-05-01:0900CEST
– http://me.org showed „...“@2013-05-01:0900CEST called
from IP 193.99.144.85 using browser...version...history...
35. Steffen Staab Web Science 35
Sharing Software
• Software
– For crawling or usage logging
– Rather than sharing the data, share the code for observing
• Example:
code for crawling Twitter in a certain way
• Issues
– Limited repeatability
– Disturbance liability („Störerhaftung“) – at least in DE
• If you provide source code for crawling, e.g., Facebook, even if you
do not crawl FB, FB can sue you
39. Steffen Staab Web Science 39
Bias in the Device
Accessibility
• Impaired eyesight
• Impaired use of
mouse and keyboard
• ...
HTML5 semantics helps
– but is not used much
40. Steffen Staab Web Science 40
Bias in the Device
Accessibility
• Impaired eyesight
• Impaired use of
mouse and keyboard
• ...
HTML5 semantics helps
– but is not used much
Haris Aslanidis
41. Steffen Staab Web Science 41
Bias in the Device
Accessibility
• Impaired eyesight
• Impaired use of mouse and
keyboard
• ...
HTML5 semantics helps – but is
not used much
http://west.uni-
koblenz.de/en/research/projects/
mamem
http://www.mamem.eu/mamem-
makes-available-gazetheweb-
browse-application/
42. Steffen Staab Web Science 42
Search engines
• Categorizing people and animals
– White vs black
– http://www.nytimes.com/2016/06/26/opinion/sunday/artificia
l-intelligences-white-guy-problem.html?_r=0
• Job advertisements
– Well-paid job not offered to females
Bias in the Software
43. Steffen Staab Web Science 43
Bias in Content/Data
Credit Hire Sex Ethnic Zip Height ... ...
+ +
+ -
- +
+ +
- -
correlated
Data protection laws
suggest not to process
sensitive data attributes
like „sex“ or „ethnic“
49. Steffen Staab Web Science 49
• The Web reflects current and past discrimination!
• Example:
– Predict who will leave the company based on email
features (ICWSM-16)
• People with outsiders‘ vocabulary more prone to fail
• Unresolved:
– Do they fail because they are less adaptable or
– Do they fail because the environment is hostile to outsider?
Bias and Social Practices
50. Steffen Staab Web Science 50
Wikipedia
• Efforts to counter bias
Law
• E.g. UK equality act
• Protected characteristics:
– Age, disability, gender, marriage, religion,...
Bias and Processes
51. Steffen Staab Web Science 51
Biases in the Social Machine:
The Case of Liquid Feedback
53. Steffen Staab Web Science 53
Online Delegative Democracy
CC-BY-SA Ilmari Karonen
54. Steffen Staab Web Science 54
Delegative Democracy
• Between direct and representative democracy
CC-BY-SA Ilmari Karonen
55. Steffen Staab Web Science 55
Delegative Democracy
• Between direct and representative democracy
• Voters can delegate their vote to other voters
CC-BY-SA Ilmari Karonen
58. Steffen Staab Web Science 58
CC-BY-SA Ilmari Karonen
Delegative Democracy
• Between direct and representative democracy
• Voters can delegate their vote to other voters
• Delegations can be revoked at any time
59. Steffen Staab Web Science 59
CC-BY-SA Ilmari Karonen
Delegative Democracy
• Between direct and representative democracy
• Voters can delegate their vote to other voters
• Delegations can be revoked at any time
• Votes are public!
61. Steffen Staab Web Science 61
LiquidFeedback – Pirate Party
• Observation: 08/2010 – 11/2013
• 13,836 Members
• 14,964 Delegations
• 499,009 Votes
62. Steffen Staab Web Science 62
LiquidFeedback – German Pirate Party
•
Users create initiatives, which are grouped by
issues and belong to areas
63. Steffen Staab Web Science 63
LiquidFeedback – German Pirate Party
•
Users create initiatives, which are grouped by
issues and belong to areas
Area: Environmental issues
Issue: CO2 output has to be reduced.
Initiative: Subsidise wind turbines!
64. Steffen Staab Web Science 64
LiquidFeedback – German Pirate Party
•
Users create initiatives, which are grouped by
issues and belong to areas
Area: Environmental issues
Issue: CO2 output has to be reduced.
Initiative: Subsidise wind turbines!
Areas: 22
Issues: 3,565
Initiatives: 6,517
65. Steffen Staab Web Science 65
LiquidFeedback – German Pirate Party
•
Users create initiatives, which are grouped by
issues and belong to areas
Delegations on global, initiative, issue
and area level
→ “Back-delegations” possible
73. Steffen Staab Web Science 73
Investigation
• What are the social processes?
• How are data collected?
• Which algorithms are used?
• How is the algorithm used in system processes?
• What are the system effects?
Purposes of investigation
• Social sciences: Observe bias in action!
• Intervention: Change the overall system!
• System sciences: Understand system limits!
• Engineering: Build „proper“ technical system!
Summary: Bias
80. Steffen Staab Web Science 80
Recommender Systems
Predict who I will add as friend next
Standard algorithm: find friends-of-friends
me
81. Steffen Staab Web Science 81
Friend of a Friend
1 2 4 5 6
3
Count the number of ways a person can be
found as the friend of a friend.
82. Steffen Staab Web Science 82
Generative Models
Example:
Link Creation by Barabasi-Albert
83. Steffen Staab Web Science 83
• Many large networks are scale free
– Matthew effect: rich get richer
• Vote delegation!
• The degree distribution has a power-law behavior for
large k (far from a Poisson distribution)
• Random graph theory and the Watts-Strogatz model
cannot reproduce this feature
General considerations
84. Steffen Staab Web Science 84
Scale free
Same shape of hull,
no matter which resolution
85. Steffen Staab Web Science 85
Scale-free of Web distribution
Same shape of distribution: no matter which k
86. Steffen Staab Web Science 86
Two generic mechanisms common in many real
networks:
• Growth (www, research literature, ...)
• Preferential attachment:
attractiveness of popularity
The two are necessary
Barabasi-Albert model (1999)
Barabási & Albert, Science 286, 509 (1999)
87. Steffen Staab Web Science 87
• t=0, m0 nodes
• Each time step we add a new node with m ( m0)
edges that link the new node to m different nodes
already present in the system
Growth
88. Steffen Staab Web Science 88
• When choosing the nodes to which the new
connects, the probability that a new node will be
connected to node i depends on the degree ki of
node
Preferential attachment
( ) i
i
j
j
k
k
k
Linear attachment (more general models)
Sum over all existing nodes
89. Steffen Staab Web Science 89
Numerical simulations
• Power-law P(k) k-
SF=3
• The exponent does not depend on m (the
only parameter of the model)
91. Steffen Staab Web Science 91
• 1945 Vannevar Bush, „As we may think“, Memex
• 1962 Ted Nelson, Hypertext
• 1965 Wide area network
• 1968 Doug Engelbart, The mother of all demos
• 1972 Public Arpanet, Email
• 1974-82 Internet protocol TCP/IP
• 1978 Consumer information services & Email
• 1983 AOL, online service for games, communities...
• 1984 Domain name service
Pre-Web
92. Steffen Staab Web Science 92
The World Wide Web
• 1989 Concept drafted by Tim Berners-Lee
• 1993 National Center for SuperComputing
Applications launched Mosaic X
• 1994 First WWW conference
• 1994 W3C started at MIT
• Commercial websites began their proliferation
• Followed by local school/club/family sites
• The web exploded
– 1994 – 3,2 million hosts and 3,000 websites
– 1995 – 6,4 million hosts and 25,000 websites
– 1997 – 19,5 million hosts and 1,2 million websites
– January 2001 – 110 million hosts and 30 million websites
93. Steffen Staab Web Science 93
The World Wide Web
– 1994/1995 Amazon
– 1994/1995 Wiki
– 1995 AltaVista Search Engine
– 1995 Internet Explorer
– 1997-2001 Browser wars
– 1996-1998 XML recommendation
– 1998 Google
– 1999 First W3C recommendation on RDF (Semantic Web)
– 2001 Dot.Com bubble bursts
– 2001 Wikipedia
– 2003/2004 Facebook
– 2004 Flickr
– 2005 YouTube
94. Steffen Staab Web Science 94
Concepts Example Applications
Web of People Physical transport service (Uber (2009),
Lyft), accommodation service (AirBnB,
Couchsurfing), online dating service,
2009
Web of Things Smart city, ambient intelligence, personal
and public health information, personal
and public transport information
2007
Web of Services Cloud services, Digital transformation,
Programmable Web (2005)
2005
Web of Data User generated content applications
(Facebook, Wikipedia (2001) and
Wikidata,…), Linked open gov data
2001
Web of Documents HTTP, HTML, XML, Browser (Mosaic
1993)
1993
Computer
Networks/Internet
Document delivery (internet 1982), VOIP,
Streaming
1982
95. Steffen Staab Web Science 95
Internet of Things vs Web of Things
IoT
• Internet
• P2P networks
• Sensors
Web of things
• How to link?
• What is „linking WoT“?
• How to use by me?
• How to discover?
• How to search?
97. Steffen Staab Web Science 97
http://static.tapastic.com/cartoons/19/f5/11/a0/a70ea5d7dbdd477d9e3bc2e0a2bfa286.gif
98. Steffen Staab Web Science 98
Aaron Swartz †
Co-author of RSS1.0 at the age of 14
99. Steffen Staab Web Science 99
http://static.tapastic.com/cartoons/19/f5/11/a0/a70ea5d7dbdd477d9e3bc2e0a2bfa286.gif
100. Steffen Staab Web Science 100
Web of People
• What is it?
– Identification
• Orcid
• Facebook ID
• Oauth 2.0
?
– Trust
• Uber score
• Airbnb score
• Ebay score
• Tinder score
• Group score
– Chinese social score
– Credit rating score
?
101. Steffen Staab Web Science 101
Concepts Delivered technical capabilities
Web of People Identification and rating by/of people
Web of Things Identification, linking, aggregation,
monitoring and controlling of things
Web of Services Identification, composition and calling of
services
Web of Data Identification, linking and retrieval of data
Web of Documents Identification, linking and retrieval of
documents
Computer
Networks/Internet
Identification of and communication
between computers
102. Steffen Staab Web Science 102
Concepts Standards
Web of People No mature standards yet
Web of Things No mature standards yet
Web of Services REST, JSON, JSONLD
Web of Data RDF, SPARQL
Web of Documents HTTP, HTML, XML, AJAX
Computer networks/Internet Internet, TCP/IP, Optical fibre, 5G
105. Steffen Staab Web Science 105
What is in the
data?
What is in the
algorithm?
What is in the
Social machine?
WebObservatory
106. Steffen Staab Web Science 106
Telling the Story with Web Science
What is in the
Data?
What is in the
Algorithm?
What is in the
Social Machine?
Story telling
Under-
standing
Modelling
107. Steffen Staab Web Science 107
Web
Accomplishments
• Web of Services
• Web of Data
• Web of Documents
• Computer
networks/Internet
Future in the making
• Web of People
• Web of Things
How to identify and use?
How to observe?
How to resolve issues?