"Video Killed the Radio Star": From MTV to Snapchat
Lecture 3: Data Formats on the Social Web (2013)
1. Social Web
Lecture III
What DATA looks like on the Social Web?
Lora Aroyo
The Network Institute
VU University Amsterdam
Monday, March 4, 13
2. Assignment 1
image source: http://www.flickr.com/photos/bionicteaching/1375254387/
Monday, February 18, 13
3. • Provide analysis of privacy issues on the (Social) Web
• three articles <--> three mind maps <--> main Social Web privacy issues
• write for people who didn’t attend the course (max 3 pages)
• Provide analysis of current privacy-related public initiatives
• legal contexts for privacy and ownership
• compare the intentions of both initiatives (advantages & disadvantages)
• your own vision on how this impacts the future of the social web
• your own advise to policy makers with regards to privacy on the web.
• links to Net Neutrality (max 2 pages)
• link to Hands-on session: what would change if SOPA/PIPA or ACTA
were active – would you still have access to the information you pulled in
for the assignments? Illustrate your answer showing what changes could
appear in the graph from exercise 4 (Hands-on session 2) and explain
why. (max 1 page)
• all visuals, e.g. screenshots, diagrams, etc. in appendix and use template
• Deadline: 22 February 23:59
Monday, February 18, 13
4. What do people
contribute on
the SW?
Monday, February 18, 13
5. Structure on the Web
• In the evolution of the Web, Semantic Web
refers to an approach to add ‘semantics’ to
the web, by naming terms in a domain
• A specification of such terms is called an
‘ontology’
• For software: ontologies help to effectively
use content on the Web (like DB schemas)
Monday, February 18, 13
6. History & Nature
of Blogs
• evolved from online diary (in the 1980’s)
• the term blog coined in late 1990’s
• Blog = weB LOG (Jorn Barger in1997)
• = we blog (Peter Merholz 1999)
• one of the first ways people could contribute content on
the Web themselves
• Nature: political, technical, art, journalistic, cultural, personal
• Software: WordPress, Blogger, LifeJournal
Monday, February 18, 13
7. Types of Blogs
• Single- or Multi-authored
• Photo-blog,Video-blog, Audio-blog
• Life (b)log, now - microlifeblog (twitter)
• lifecasting: in 2007 by Justin Kan: webcam on a cap
• Gordon Bell MyLifeBits: Microsoft SenseCam
http://www.justin.tv/
http://research.microsoft.com/en-us/projects/mylifebits/
Monday, February 18, 13
8. Wikis
• Wiki in Hawaiian meaning fast/quick
• "the simplest online database that
could possibly work" (Ward
Cunningham), 1995
• first wiki software: WikiWikiWeb
(the QuickWeb)
http://en.wikipedia.org/wiki/Ward_Cunningham
http://en.wikipedia.org/wiki/WikiWikiWeb
Monday, February 18, 13
9. Wiki Features
• a website powered by wiki software
• created and maintained collaboratively by multiple users
= an ongoing process that constantly changes the site
• not a carefully crafted site for casual visitors
• users can add, modify or delete content
• to obtain meaningful topic associations between
different pages, page link creation is easy
• Examples: community websites, corporate intranets,
knowledge management systems, and note taking
Monday, February 18, 13
10. Wiki Implementation
• as an application server that runs on one or more web servers
• content is stored in a file system, and changes to the content
are stored in a relational database management system
• commonly implemented software package is MediaWiki
(known from Wikipedia)
• pages structure & formatting: simplified markup language
(wikitext)
• style & syntax of wikitexts vary among wiki implementations
(some also allow HTMLtags or use WYSIWYG editing)
• Issues: control of editing & changes, trust & security
Monday, February 18, 13
13. Exploiting the crowd
• in the wiki applications crowd
contributes with collective
intelligence (textual)
• later other media & recourses
emerged, e.g., photo, video, music
• crowdsourcing
Monday, February 18, 13
14. Example
• in 1760 Wolfgang von Kempelen designed The Turk
• in 2005 Amazon introduced the Amazon Mechanical Turk
• marketplace for work; people perform tasks computers
are lousy at, e.g. identifying items in a photo/video,
writing product descriptions, transcribing podcasts
• HITs = human intelligence tasks
• require very little time & offer very little compensation
• workers & requesters
Monday, February 18, 13
17. Question?
Was the $1 million Netflix prize a victory for crowdsourcing?
Monday, February 18, 13
18. 5 Rules of the New
Labor Pool
• The crowd is dispersed and can perform a range
of tasks – from the most rote to the highly specialized
• The crowd has a short attention span, so jobs
need to be broken into “micro-chunks”
• The crowd is full of specialists
• The crowd produces mostly crap - no increase in
the amount of talent – the challenge is to find and
leverage that talent
• The crowd finds the best stuff - finds the best
material and corrects errors
By Jeff Howe
Monday, February 18, 13
19. Folksonomy
• On the social web the user-generated content is
organized in light-weight ontologies, i.e., folksonomies
• Community-based semantics = a relationship between
Users,Tags & Resources
• user-created, bottom-up classification/categorization
of (domain) terms / user-labels, e.g., tags
• tagging = the social process where lay users attach
labels to resources (as opposed to annotation by
professional experts)
Monday, February 18, 13
27. Vocabularies on the
(Social) Web
• to create interfaces or exchange data
between applications the software needs to
know the terms in the data
• vocabularies define set of terms in a certain
domain, e.g., describing people,
relationships, content of different type
Monday, February 18, 13
28. FOAF
• FOAF = Friend of a Friend Linked Data & FOAF
• model for publishing simple
• a machine-readable ontology factual data via a networked of
linked RDF documents
describing persons, their activities • FOAF is an attempt to use the
Web to:
& their relations to other people • integrate factual information
and objects with information in human-
oriented documents (e.g.
•
videos, books, spreadsheets,
an open, decentralized technology 3d models)
for connecting social Web sites, & • and info that is still in
people's heads
the people they describe • linking networks of information
with networks of people
• http://www.foaf-project.org/, 2000
Monday, February 18, 13
29. FOAF Vocabulary
• Gradual evolution since mid-2000
• Stable core of classes and properties
• New terms may be added at any time
• FOAF RDF namespace URI is fixed
• http://xmlns.com/foaf/spec/
Monday, February 18, 13
30. FOAF Files
• Documents, that adopt the conventions of RDF and may be written in XML,
RDFa or N3
• Contain FOAF vocabulary and other RDF vocabularies
• FOAF defines classes, e.g. foaf:Person, foaf:Document, foaf:Image
• FOAF defines properties of those things, e.g. foaf:name, foaf:homepage
• FOAF defines relationship that hold between members of these
categories, e.g. foaf:depiction relates something (e.g. a foaf:Person) to a
foaf:Image
Monday, February 18, 13
31. FOAF Example
• there is a foaf:Person
• with a foaf:name property of 'Dan Brickley'
• in foaf:homepage and foaf:openid relationships to a thing called http://danbri.org/
• in foaf:img relationship to a thing referenced by a relative URI of /images/me.jpg
Create your own FOAF file: http://www.ldodds.com/foaf/foaf-a-matic
Monday, February 18, 13
32. FOAF Auto-Discovery
• If you publish a FOAF self-description (e.g. using
foaf-a-matic) you can make it easier for tools to
find your FOAF by putting markup in the head of
your HTML homepage
• Common filename foaf.rdf is a common choice
Monday, February 18, 13
36. SIOC
• Semantically-Interlinked Online Communities
• a standard way for expressing user-generated content, i.e.,
enable the integration of online community information
• methods for interconnecting discussions, e.g., blogs, forums
& mailing lists
• Ontology for representing rich data from Social Web in RDF
• commonly used in conjunction with the FOAF
vocabulary for expressing personal profile and social
networking information
• http://sioc-project.org/
Monday, February 18, 13
37. <sioc:Post rdf:about="http://jbreslin.com/blog/2006/09/07/creating-connections"> 1
<dc:title>Creating connections between discussion clouds with SIOC</dc:title>
2 <dcterms:created>2006-09-07T09:33:30Z</dcterms:created>
<sioc:has_container rdf:resource="http://jbreslin.com/blog/index.php?sioc_type=site#weblog"/>
<sioc:has_creator>
<sioc:UserAccount rdf:about="http://jbreslin.com/blog/author/cloud/" rdfs:label="Cloud"> 3
6 <rdfs:seeAlso rdf:resource="http://jbreslin.com/blog/index.php?sioc_type=user&sioc_id=1"/>
</sioc:UserAccount>
</sioc:has_creator>
<foaf:maker rdf:resource="http://jbreslin.com/blog/author/cloud/#foaf"/>
<sioc:content>SIOC provides a unified vocabulary for content and interaction description: a semantic la
that can co-exist with existing discussion platforms. 5
</sioc:content>
4 <sioc:topic rdfs:label="Semantic Web" rdf:resource="http://jbreslin.com/blog/category/semantic-web/"/>
<sioc:topic rdfs:label="Blogs" rdf:resource="http://jbreslin.com/blog/category/blogs/"/>
7 <sioc:has_reply>
<sioc:Post rdf:about="http://jbreslin.com/blog/2006/09/07/creating-connections/#comment-123928">
<rdfs:seeAlso rdf:resource="http://johnbreslin.com/blog/index.php?
sioc_type=comment&sioc_id=123928"/> 8
</sioc:Post>
</sioc:has_reply>
</sioc:Post>
• A post (1) titled "Creating connections between discussion clouds with SIOC" (2)
created at 09:33:30 on 2006-09-07 (3) written by user "Cloud" (4) on topics
"Blogs" and "Semantic Web" (5) with contents described in sioc:content.
• (6) More information about its author at http://johnbreslin.com/blog/
index.php?sioc_type=user&sioc_id=1
• The post has (7) a reply and (8) detailed SIOC information about this reply can be
found at http://johnbreslin.com/blog/index.php?
sioc_type=comment&sioc_id=123928
Monday, February 18, 13
42. Activity Streams
• A list of recent activities performed by someone on a
website
• Example: Facebook News Feed
• Activity Streams project aims at an activity stream
protocol to syndicate activities across social Web applications
• Major websites with activity stream implementations
have already opened up their activity streams to developers
to use, e.g., Facebook and MySpace
• http://activitystrea.ms/
Monday, February 18, 13
43. Activity Streams
Specification
• an actor, a verb, an object and a target
• person performing an action on/with an object
• Geraldine posted a photo to her album
• John shared a video
• activity metadata to present to a user in a rich human-friendly
format, e.g. constructing readable sentences about the activity
that occurred, visual representations of the activity, or
combining similar activities for display
• Activities are serialized using the JSON format
• There is also an ATOM-oriented specification
Monday, February 18, 13
44. Activity Streams
Example
http://activitystrea.ms/specs/json/1.0/
Monday, February 18, 13
45. Activity Streams
Example
http://activitystrea.ms/specs/json/1.0/
Monday, February 18, 13
48. XFN
• Xhtml Friends Network
• relationships between individuals: by defining a small set of values
that describe personal relationships
• In HTML and XHTML documents, these are given as values for
the rel attribute on a hyperlink. XFN allows authors to indicate
which of the weblogs they read belong to friends, whom they've
physically met, and other personal relationships. Using XFN
values, which can be listed in any order, people can humanize
their blogrolls and links pages, both of which have become a
common feature of weblogs.
• using XFN can easily style all links of a particular type; thus, friends
could be boldfaced, co-workers italicized, etc.
• http://gmpg.org/xfn/
Monday, February 18, 13
49. XFN Example
• Joe has a set of five links in his blogroll: his girlfriend
Jane; his friends Dave and Darryl; industry expert James,
who Joe briefly met once at a conference; and
MetaFilter.
• MetaFilter gets no value since it is not an actual person
http://gmpg.org/xfn/intro
Monday, February 18, 13
50. 5 people who’ve met
friends vs. acquaintances
colleagues vs. co-workers love vs. family
http://gmpg.org/xfn/intro
Monday, February 18, 13
51. Open Graph
• protocol originally developed in Facebook
• enables web pages to become a rich object in a social graph, i.e. any
web page to have the same functionality as any other object on
Facebook
• Basic Metadata: to turn your web pages into graph objects
• og:title = title of your object e.g., "The Rock"
• og:type = type of your object e.g.,
"video.movie"
• og:image = image URL to represent your object
within the graph
• og:url = canonical URL of your object that will
be used as its permanent ID in the graph, e.g.,
"http://www.imdb.com/title/tt0117500/"
Monday, February 18, 13
52. OGP: Explained
• “Like” button on each of your posts
• Open Graph Protocol to mark up content OGP:
• prefix="og: http://ogp.me/ns#" specifies the OGP
vocabulary
Monday, February 18, 13
53. OGP Explained
1. import the Dublin Core & Open Graph
Protocol vocabularies using the
prefix attribute
2. associate a prefix, dc and og with the
URL for each vocabulary
3. use dc:creator and og:title,
which are short-hand for the full
vocabulary term URLs http://
purl.org/dc/creator/creator
and http://ogp.me/ns#title,
respectively
Monday, February 18, 13
55. RDFa
http://rdfa.info/play/
• another syntax for RDF
• HTML5 extension for People, Places, Events, Recipes, Reviews markup
• e.g., specify that a text is the name of a product, or person, or
event = “adding semantic markup”.
• RDFa 1.1 = specified for XHTML and HTML5 (for any XML-based
language, e.g., SVG)
• RDFa Lite = “a small subset of RDFa consisting of a few attributes that
may be applied to most simple to moderate structured data markup
tasks.”
• Publish your data as Linked Data through RDFa --> link to other
URIs (others can link to your HTML+RDFa)
Monday, February 18, 13
56. Microformats
• A set of simple, open data formats built upon
existing and widely adopted standards
• Designed for humans first and machines second
• Design principles for formats
• Highly correlated with semantic XHTML (aka
the real world semantics, lowercase semantic web,
lossless XHTML)
• “An evolutionary revolution”, by ryan king
Monday, February 18, 13
58. Your first microformat
• You can put a microformat on your website in less than 5 mins
• Example: putting an hCard (online business card) on your site
1. Find your name somewhere on your website
2. Wrap your name in an fn (formatted name)
<span class="fn">Jamie Jones</span>
3. Wrap it all in a vcard (declares that everything inside is the hCard microformat):
<span class="vcard"><span class="fn">Jamie Jones</span></span>
<address class="vcard"><span class="fn">Jamie Jones</span></address>
The address element indicates that the person in the hCard is the contact for the page
<p class="vcard">My name is <span class="fn">Jamie Jones</
span> I dig microformats!</p>
http://microformats.org/get-started
Monday, February 18, 13
59. Further microformats
• Add more information to your hCard
• Link to your friends and contacts with XFN
• Add events to your site with hCalendar
• Review movies, books, and more with hReview
http://microformats.org/get-started
Monday, February 18, 13
60. HTML Microdata
• HTML Microdata allows machine-readable
data to be embedded in HTML documents in an
easy-to-write manner, with an unambiguous
parsing model
• It is compatible with numerous other data
formats including RDF and JSON
• Microdata DOM API
• http://www.w3.org/TR/microdata/
Monday, February 18, 13
61. Microdata Syntax
• Microdata consists of a group of name-value pairs.
The groups are called items, and each name-value
pair is a property
• itemscope is used to create an item
• itemprop is used to add a property to an item
Monday, February 18, 13
62. Microdata Example
3 properties
URL
Time
top-level
Monday, February 18, 13
63. schema.org
• Google,Yahoo!, Bing
• a common vocabulary for
structured data markup on
web pages
• improve how sites appear in
major search engines
• Google rich snippets of
reviews, people, recipes,
events in 2005
Monday, February 18, 13
65. Knowledge
Graph
• graph that understands real-
world entities and their
relationships to one another:
things, not strings
• more than 500 million objects
• more than 3.5 billion facts
about and relationships
between these different objects
• tuned based on what people
search for
Monday, February 18, 13
68. is schema.org &
Open Graph really
can they work togeher needed?
Monday, February 18, 13
69. Question?
For which things on the social web would more vocabularies
for embedded semantics be needed (besides what we have
already seen)?
value of social
(personal) data?
Monday, February 18, 13
70. Hands-on Teaser
• mining data in various social web
formats
• see the differences in what each of the
formats can contain & what purpose
they serve
• start: simple search where we pull in
some XFN data and visualise a graph of
people that we find on a website
• check: software you will be working
with on the website
image source: http://www.flickr.com/photos/bionicteaching/1375254387/
Monday, February 18, 13