Semantic Web SEO is characterized by a number of concepts that help achieve the goals of increasing library reach by making digital collections more accessible and visible. Intelligent search engines will seek and utilize well-structured linked data that improves processing efficiency and the ability to return more accurate results and a richer search experience for users. Semantic search places less importance on the wording of a query and uses probabilities and algorithms to determine intent of the user. In this workshop we demonstrate how linked data concepts and Schema.org can be incorporated into digital libraries to improve search engine contextual understanding of collections and deliver a better experience to their users.
SEO requires tools to measure the effect of your efforts and the value it produces. We will provide a framework and a Google Analytics Scorecard that digital repository collection managers, libraries and their funders can use as a baseline for making informed decisions and tracking progress toward the goal of increasing access and visibility of digital libraries.
Attendees of this workshop will gain knowledge in the following areas:
1. A basic understanding of Semantic Web SEO and its two most important concepts for digital repositories.
2. A baseline Google Analytics dashboard to support pre/post funding decisions and the knowledge to get started
3. Simplifying the setup and administration of Google Analytics and Google Webmaster for their entire organization and their stakeholders
4. A basic understanding of how to incorporate Schema.org and linked data into a digital repository
Session Leaders:
Kenning Arlitsch, Montana State University
Patrick OBrien, Montana State University
Semantic Web SEO: Using Linked Data and schema.org to improve Library Reach and Digital Repository Access
1. Semantic Web SEO: Using Linked Data
and schema.org to improve Library Reach
and Digital Repository Access
Kenning
Arlitsch
&
Patrick
OBrien
DLF
Fall
–
Denver,
Colorado
November
5,
2012
2. Today’s
Objec.ves
u Basic
understanding
of
v Semantic
Web
SEO
for
digital
repositories
v How
to
get
started
incorporating
Schema.org
and
linked
data
into
a
digital
repository
u Implement
baseline
metrics
to
support
pre/post
funding
decisions
of
digital
repositories
v Simplify
setup
and
administration
of
Google
Analytics
and
Google
Webmaster
for
an
organization
and
its
stakeholders
v Implement
Digital
Repository
SEO
Google
Analytics
dashboard
3. Agenda
u Why
SEO
&
the
Semantic
Web
Matters
v Performance
&
Accountability
v The
semantics
of
what
really
matters
today
u How
to
Get
Started
v SEO
Administration
at
an
Institutional
Scale
v Enhance
Your
Data
v Clean
up
You
Data
4. You
can
not
evaluate
what
you
do
not
measure
"We
cannot
call
a
digital-‐library
or
electronic-‐
publishing
system
a
success
if
we
cannot
measure
and
interpret
its
use"
-‐ -‐
Ann
Peterson
Bishop
“Logins
and
Bailouts:
Measuring
Access,
Use,
and
Success
in
Digital
Libraries”
The
Journal
of
Electronic
Publishing
Volume
4,
Issue
2,
December,
1998
5. Funding
providers
want
more
accountability
and
demonstrated
value*
u “IMLS
is
focusing
on
areas
where
it
can
best
effect
change
and
measure
its
results.”**
u The
IMLS
assessment
model
will
“identify
effective
museum
and
library
services
through
performance
monitoring”
among
other
things.**
* ACRL Research Planning and Review Committee, “2010 top ten trends in academic libraries,” June 2010
**Institute of Museum and Library Services. 2011. “Creating a Nation of Learners; IMLS Five-Year Strategic Plan 2012–2016”
6. Accountability
extends
beyond
gran.ng
agencies
u State
Legislatures
v Local
tax
payers
u University
administration
u Library
administration
u Donors
u Association
of
Research
Libraries
statistics
7. Accountability
at
the
Ins.tu.onal
level
u Enable
all
your
Stakeholders
v Collection
Managers
v IT
Personnel
v Administrators
u Avoid
the
free-‐for-‐all
of
silos
u Establish
an
institutional
master
account
v Administer
rights
v Everyone
uses
same
baseline
metrics
and
tools
8. 2010:
began
looking
at
proxy
metrics
for
digital
collec.on
public
accessibility
and
use
u 12+
Billion
v Number
of
search
queries
submitted
to
Google
each
month
by
Americans*
u 12%
v Percentage
of
our
digital
collection
content
in
Google
index
u 0.5%
v Percentage
of
our
USpace
IR
scholarly
papers
accessible
to
researchers
using
Google
Scholar
* http://www.comscore.com/Press_Events/Press_Releases/2012/1/comScore_Releases_December_2011_U.S._Search_Engine_Rankings
9. Basic
SEO
has
improved
collec.on
accessibility
in
Google
across
the
board…
Google Index Ratio - All Collections*
12%
Average
51%
79%
37%
High**
87%
100%
0%
25%
50%
75%
100%
07/05/10
04/04/11
11/30/11
* Google Index Ratio = URLs submitted / URLs Indexed by Google for about 150 collections containing ~170,00 URLs
**Highest index ratio achieved for Collections with over 500 URLs submitted to Google
10. …almost
100%
of
USpace
IR
content
is
accessible
to
patrons
using
Google.
Google Index Ratio
12%
07/05/10
ETD
1
69%
11/19/10
97%
10/16/11
0%
ETD
2
68%
98%
23%
UScholar
Works
51%
98%
4%
Board
of
Regents
47%
97%
0%
25%
50%
75%
100%
*October 16, 2011 Weighted Average Google Index Ratio = 97.82% (10,306/10,536).
11. …resul.ng
in
more
referrals
and
visitors
12 week comparison 2010 vs. 2012
12. Agenda
u Why
SEO
&
the
Semantic
Web
Matters
v Performance
&
Accountability
v The
semantics
of
what
really
matters
today
u How
to
Get
Started
v SEO
Administration
at
an
Institutional
Scale
v Enhance
Your
Data
v Clean
up
You
Data
13. Today’s
Key
Premise,
Concepts
&
Focus
u SEO
Goals
are
to
increase
access,
visibility
and
use
by
patrons
that
value
our
content
u Semantic
Web
is
a
framework
of
standards
and
technologies
to
share,
integrate
and
represent
data
as
concepts
across
different
content,
information
and
system
boundaries.
u Semantic
Search
incorporates
the
Semantic
Web
to
understand
the
context
and
intent
of
users
seeking
information
and
the
concepts
contained
within
a
document
14. Why
seman.c
search
is
useful
u Perfect
application
for
research
&
discovery
of
concepts
v Apple
Siri
v IBM
Watson
v Google
Knowledge
Graph
u Making
content
Search
Engine
Readable
&
semantically
Understandable
can
increase
v click
though
rates
(CTR)
by
15%*
v organic
trafjic
by
30%*
* http://searchengineland.com/how-to-get-a-30-increase-in-ctr-with-structured-markup-105830
19. 4
Major
SE’s
commiZed
to
Schema.org
as
their
seman.c
model
20. The
4
major
SE’s
have
commiZed
Schema.org
as
their
Seman.c
model
u SE
Understandable
v Schema.org
is
a
mechanism
(i.e.,
ontology)
to
communicate
the
meaning
of
your
data
u SE
Readable
v Microdata
and
RDFa
are
the
preferred
way
SE’s
read
your
data
u US
submits
19
Billion
queries
per
month
to
3
of
these
SE’s*
u We
have
not
found
any
tools
within
reach
of
typical
Library
budgets,
or
skill
sets,
that
are
easily
implementable
* http://www.comscore.com/Press_Events/Press_Releases/2012/1/comScore_Releases_December_2011_U.S._Search_Engine_Rankings
21. Agenda
u Why
SEO
&
the
Semantic
Web
Matters
v Performance
&
Accountability
v The
semantics
of
what
really
matters
today
u How
to
Get
Started
v SEO
Administration
at
an
Institutional
Scale
v Enhance
Your
Data
v Clean
up
You
Data
22. Created
a
SEO
Scorecard
designed
to
support
pre
/
post
funding
decisions
u Assembled
Team
of
v Collection
Managers
v Business
School
Group
Project
v 2nd
Year
MBA
Team
u Focused
on
the
10
Google
Analytics
features
that
support
v IMLS
&
NEH
strategic
plan
v SEO
Collection
Manager
Goals
23. Created
a
SEO
Scorecard
designed
to
support
pre
/
post
funding
decisions
24. Workshop
Process
u Diagrams
and
Process
of
what
we
did
at
Utah
u Live
Demo
Using
Montana
State
(MSU)
u Information
that
would
be
helpful
today
v Access
to
your
organization’s
Admin
Accounts
(i.e.,
User
ID
&
password)
n Google
Analytics
n Google
Webmaster
Tools
v An
internal
list
server
for
your
organizations
Managers
responsible
for
making
pre
/
post
funding
digital
repository
decisions
26. Steps
for
se]ng
up
Measurement
&
Evalua.on
for
your
Ins.tu.on
and
Staff
1. Associate
a
Google
Account
with
your
Institution
2. Staff
create
their
own
Google
Account
using
their
Institution
email
address
3. Activate
Google
Services
using
your
Institution
Google
Account
v Google
Analytics
v Google
Webmaster
Tools
4. Add
Staff
to
Google
Services
using
their
Institution
email
addresses
29. Step
1:
Associate
a
Google
Account*
(Master)
with
your
Ins.tu.on
u Use
an
internal
list
server
e.g.,
seo@utah.edu
u Include
managers
who
are
responsible
for
administration
v Google
Analytics
v Google
Web
Master
Tools
* https://accounts.google.com/NewAccount
36. Next
steps
are
to
test
scalable
tools
and
repeatable
process
u Found
issues
with
most
Analytics
conjigurations
u We
Need
study
participants
to
evaluate
and
test
accuracy
of
additional
analytics
tools
being
developed
under
IMLS
Grant
program
37. What
type
of
web
analy.cs
socware
does
your
IR
use?
A. Analytics
Service
B. Log
Files
C. Don't
Know
D. None
IR
HTML
Page Tagging B
A {JavaScript} Log Files
Analytics Service
38. Both
types
have
poten&al
accuracy
issues
for
IRs
A. Analytics
Services
v Under
count
non-‐HTML
(e.g.,
PDF)
jile
downloads
B. Log
Files
v Over
count
visits
&
downloads
due
to
spiders,
etc.
IR
v Under
count
page
views
due
to
web
caching
–
upto
30%
HTML
Page Tagging B
A {JavaScript} Log Files
Analytics Service
39. Analy.cs
Services
do
not
track
non-‐HTML
downloads
out
of
the
box
Special
Config Non-HTML
HTML
Page Tagging
A {JavaScript}
Analytics Service
40. Analy.cs
Services
do
not
track
non-‐HTML
file
downloads
via
direct
external
links
Non-HTML
HTML
Page Tagging
A {JavaScript}
Analytics Service
41. Agenda
u Why
SEO
&
the
Semantic
Web
Matters
v Performance
&
Accountability
v The
semantics
of
what
really
matters
today
u How
to
Get
Started
v SEO
Administration
at
an
Institutional
Scale
v Enhance
Your
Data
v Clean
up
You
Data
42. Tradi.onal
SEO
is
s.ll
very
important,
but
not
today’s
focus.
u Descriptive
Page
Titles,
anchor
text,
descriptions,
etc.
u Easy
&
Intuitive
Site
Navigation
u Submit
sitemaps/conjigure
robots.txt
jile
u Monitor/address
errors
u Inform
staff
&
assign
ownership
u Clean
metadata
u Upgrade
repository
software
43. Recommended
Background
informa.on
u Ronallo,
Jason.
"HTML5
Microdata
and
Schema.
org."
Code4Lib
Journal
(2012).
http://journal.code4lib.org/articles/6400
u Arlitsch,
Kenning,
and
Patrick
OBrien.
"Invisible
Institutional
Repositories:
Addressing
the
Low
Indexing
Ratios
of
IRs
in
Google
Scholar."
Library
Hi
Tech
30,
no.
1
(2012):
60-‐81.
http://www.emeraldinsight.com/journals.htm?articleid=17020806
u Arlitsch,
Kenning,
and
Patrick
OBrien.
"Search
Engine
Optimization
(SEO)
for
Institutional
Repositories."
In
Technical
Advances
for
Innovation
in
Cultural
Heritage
Institutions
(TAI
CHI)
Webinar
Series;
2012
Mar
16;
pp.
1-‐48.
OCLC
Research,
Online
Computer
Library
Center,
Inc.
(OCLC),
2012.
http://www.oclc.org/resources/research/events/20120316seo.pdf
u Arlitsch,
Kenning,
and
Patrick
OBrien.
"Search
engine
optimization
(SEO)
for
digital
repositories."
In
Coalition
for
Networked
Information
(CNI)
Spring
2011
Membership
Meeting;
2011
Apr
4-‐5;
San
Deigo,
California,
USA;
pp.
1-‐25.
J.
Willard
Marriott
Library,
University
Libraries,
University
of
Utah,
2011.
http://content.lib.utah.edu/utils/getfile/collection/uspace/id/1976/filename/713.pdf
44. Challenge
is
presen.ng
structured
data
SE’s
can
iden.fy,
parse
and
digest
Human Readable
Woljinger,
N.
H.,
&
McKeever,
M.
(2006,
July).
Thanks
for
nothing:
changes
in
income
and
labor
force
participation
for
never-‐married
mothers
since
1982.
In
101st
American
Sociological
Association
(ASA)
Annual
Meeting;
2006
Aug
11-‐14;
Montreal,
Canada
(No.
2006-‐07-‐04,
pp.
1-‐42).
Institute
of
Public
&
International
Affairs
(IPIA),
University
of
Utah.
Machine Understandable
46. However,
Google
can
not
understand
or
read
any
of
our
“structured
data”
nd able
de rsta
No t Un
rg = ad able
em a.o t Re
Sch No
N o DFa=
R
data or
icro
N oM
47. Work
Shop
Excercise
Meta
Tag
Working
Paper
1
-‐
citation_author
Arlitsch,
Kenning;
OBrien,
Patrick
2
-‐
citation_date
2011-‐04-‐05
3
-‐
citation_title
Search
engine
optimization
(SEO)
for
digital
repositories
6
-‐
citation_volume
7
-‐
citation_issue
8
-‐
citation_jirstpage
1
9
-‐
citation_lastpage
25
10
-‐
citation_doi
13
-‐
citation_keywords
SEO
Tips,
Special
Collections,
Digital
Collection,
Institutional
Repository,
Digital
Repository
16
-‐
citation_technical_report_institution
University
of
Utah
17
-‐
citation_technical_report_number
18
-‐
citation_language
en
19
-‐
citation_conference_title
Coali'on
for
Networked
Informa'on
(CNI)
Spring
2011
Membership
Mee'ng;
201
Apr
4-‐5;
San
Diego,
California,
USA
21
-‐
citation_pdf_url
http://content.lib.utah.edu/utils/getfile/collection/uspace/id/1976/filename/713.pdf
22
-‐
citation_abstract_html_url
http://content.lib.utah.edu/cdm/ref/collection/uspace/id/1976
23
–
University
University
of
Utah
24
–
College
University
Libraries
25
–
Department
J.
Willard
MarrioO
Library
26
–
subject.LCSH
Web
search
engines;
Web
sites-‐-‐Registra'on
with
search
engines;
Digital
libraries-‐-‐Collec'on
development
48. Describe
concepts
using
Schema.org
to
help
SE
understand
your
repository
u Answer
Questions
v What
type
of
WebPage?
v What
content
/
data
does
the
page
contain?
v Who
was
involved?
n Organizations?
n People?
v Where
is
it?
u Look
at
the
properties
to
see
if
the
concept
applies
49. WebPage
concepts
relevant
to
digital
repositories
u Creative
Works
>
WebPage*
u WebPage
Classes
u Important
Properties
v SearchResultsPage
v description
v CollectionPage
v breadCrumb
n ImageGallery
n VideoGallery
v isPartOf
v ItemPage
v signijicantLink
v signijicantLinks
* http://schema.org/WebPage
51. Organiza.ons
might
be
relevant
u Organization*
v EducationalOrganization
n CollegeOrUniversity
v LocalBusiness
u Important
Properties
n Library**
v member
v employee
v contactPoint
* http://schema.org/Organization
** http://schema.org/Library
52. What
People
might
be
relevant
u Person*
u Important
Properties
v memberOf
v worksFor
v jobTitle
v email
v afjiliation
v alumniOf
* http://schema.org/Person
53. What
loca.ons
might
be
relevant?
u Place*
v LandmarksOrHistoricalBuildings
u Intangible
>
StructuredValue
v GeoCoordinates
u Important
Properties
v geo
v photo
v address
v containedIn
* http://schema.org/Place
54. Check
your
work
using
Google
Rich
Snipet
Tool
<title>Search engine optimization (SEO) for digital repositories</title>
<body itemscope itemtype="http://schema.org/WebPage">
<div itemprop="breadcrumb">
<a href="category/ir.html">Uspace Instutional Repository</a> >
<a href="category/CollegeofSocialBehavioralScience.html">University Libraries</a> >
<a href="category/books-literature.html">J. Willard Marriott Library</a> >
</div>
<div itemscope itemtype="http://schema.org/ScholarlyArticle">
<span itemprop="name">Search engine optimization (SEO) for digital repositories</span>
<div itemscope itemtype="http://schema.org/Person">
<span itemprop="name">Patrick OBrien</span>
<a href="http://www.linkedin.com/in/obrienpatricks" itemprop="url">Patrick OBrien Resume</a>
<span itemprop="jobTitle">Semantic Web Research Director</span>
<div itemprop="affiliation" itemscope itemtype="http://schema.org/CollegeOrUniversity">
<span itemprop="name">Montana State University Library</span>
</div>
<div itemprop="affiliation" itemscope itemtype="http://schema.org/Organization">
<a href="http://www.RevXcorp.com" itemprop="name">RevX Corporation</a>
</div>
</div>
</div>
</body>
55. Ques.ons
&
Study
Par.cipa.on?
Kenning
Arlitsch
Dean
of
the
Library
at
Montana
State
University
kenning.arlitsch@montana.edu
Patrick
OBrien
Semantic
Web
Research
Director
patrick.obrien4@montana.edu