The document discusses the Marriott Library's search engine optimization (SEO) program and its efforts to increase indexing of digital collections in Google and Google Scholar. While indexing ratios in Google increased significantly, indexing in Google Scholar remained very low across institutions. The library surveyed other repositories and found most used indirect URLs that made indexing difficult. Addressing this issue by transforming metadata schemas could help solve the low indexing ratio problem in Google Scholar.
Improving Institutional Repository Search Engine Visibility in Google and Google Scholar
1. Invisible
Ins*tu*onal
Repositories:
Addressing
the
Low
Indexing
Ra*o
of
IRs
in
Google
Scholar
by
Transforming
Metadata
Schema
rlitsch
&
Patrick
OBrien
Kenning
A
October
31,
2011
2011
Fall
DLF,
Baltimore,
MD
2. Today’s
Objec*ves
u Discuss
Marriott
Library
SEO
program
v Program
Priorities
&
Results
v Issues
&
Opportunity
v Google
Scholar
3. MarrioE
Library
SEO
program
priori*es
u Digital
repositories
vs.
general
websites
v Millions
of
objects
in
databases
v Include
IR
u Priority
1
–
Increase
Reach
v Get
objects
indexed
in
search
engines
u Priority
2
–
Increase
Visibility
v Provide
robust
descriptive
content
4. Collec*on
Google
Index
Ra*os
have
increased
across
the
board…
Google Index Ratio - All Collections*
12%
Average
51%
74%
37%
High**
87%
100%
0%
25%
50%
75%
100%
07/05/10
04/04/11
10/16/11
* Google Index Ratio = URLs submitted / URLs Indexed by Google for about 150 collections containing ~170,00 URLs
**Highest index ratio achieved for Collections with over 500 URLs submitted to Google
6. However,
Google
Scholar
Index
Ra*os
??
Google Scholar Index Ratio
0%
You can find Marriott IR papers in Google now, but can
not find them in Google Scholar. Why?
7. Today’s
Objec*ves
u Discuss
Marriott
Library
SEO
program
v Program
Priorities
&
Results
v Issues
&
Opportunity
v Google
Scholar
9. College
Students
Begin
Research
-‐
2010
DeRosa,
Cathy,
et
al.
“Perceptions
of
Libraries,
2010:
Context
and
Community:
A
Report
to
the
OCLC
Membership”,
OCLC,
2010.
11. MarrioE
Library
Management
Experiences
u Large
digital
collections
built
over
a
decade
v 1.3+
million
items
u Why
weren’t
we
getting
indexed?
v Harvesting/indexing
rates
as
low
as
8%
v Non-‐existent
IR
showing
in
Google
Scholar
u Sitemaps
generated
for
Google
12. MWDL
Repositories
Survey
%
w/
Indirect
URL
Utah
Digital
Newspapers
Repository
University
of
Nevada,
Reno
University
of
Utah
Southern
Utah
University
Brigham
Young
University
Utah
State
University
Utah
State
Archives
Utah
State
University
Utah
Valley
University
Weber
State
University
Health
Education
Assets
Library
University
of
Nevada,
Las
Vegas
Utah
State
Library
0%
25%
50%
75%
100%
October 2010
13. MWDL
Repositories
Survey
%
w/
Direct
URL
University
of
Nevada,
Reno
Utah
State
University
University
of
Utah
Utah
State
University
University
of
Nevada,
Las
Vegas
Utah
Valley
University
Brigham
Young
University
Weber
State
University
Health
Education
Assets
Library
Southern
Utah
University
Utah
State
Library
Utah
State
Archives
Utah
Digital
Newspapers
Repository
0%
25%
50%
75%
100%
October 2010
14. Literature
Lessons
u Most
are
dated
u Most
deal
with
general
websites
u Few
deal
with
digital
collections
in
db’s
u Some
suggest
duplicating
the
content
outside
the
database
15. Today’s
Objec*ves
u Discuss
Marriott
Library
SEO
program
v Program
Priorities
&
Results
v Issues
&
Opportunity
v Google
Scholar
16. Why
does
Google
Scholar
MaEer
??
u “researchers
find
Google
and
Google
Scholar
to
be
amazingly
effective”
and
accept
the
results
as
“good
enough
in
many
cases”
(Kroll
&
Forsman
2010)
u “broader
awareness
of
specialized
Google
tools
such
as
Google
Scholar
and
Google
Book
among
faculty
members
and
graduate
students”
(Rieger
2009)
u “the
amount
of
qualified
scholarly
content
has
increased
considerably
in
Google
Scholar
since
it
was
launched
in
2004
(Mikki
2009)
u 4%
-‐
27%
use
increase
in
four-‐year
U
Miss
study
(Herrera
2010)
17. USpace
IR
Google
Index
Ra*os
baseline
Google Index Ratio
12%
07/05/10
ETD
1
11/19/10
10/16/11
0%
ETD
2
23%
UScholar
Works
4%
Board
of
Regents
0%
25%
50%
75%
100%
*Weighted Average Google Index Ratio = 18.33% (1,188/6,482)
18. USpace
IR
Google
Index
Ra*os
baseline
Google Index Ratio
07/05/10
Google Scholar Index Ratio
ETD
1
12%
11/19/10
0%
10/16/11
0%
ETD
2
23%
UScholar
Works
4%
Board
of
Regents
0%
25%
50%
75%
100%
*Weighted Average Google Index Ratio = 18.33% (1,188/6,482)
19. Low
GS
indexing
ra*os
cut
across
ins*tu*ons
Google
Scholar
Indexing
Ratio
for
Selected
Institutional
and
Disciplinary
Repositories
October
2011
Baylor
U
-‐
BearDocs
89%
Digital
Commons@UNLincoln
60%
Virginia
Tech
-‐
CS
Tech
Reports
60%
Aquatic
Commons
56%
Cornell
-‐
arXiv
47%
Cornell
-‐
Digital
Commons@ILR
40%
IUPUI
Scholar
38%
BYU
Scholars
Archive
34%
Michigan
-‐
Deep
Blue
34%
Univ
of
Oregon
-‐
Scholars
Bank
29%
Harvard
Univ
-‐
DASH
28%
eCommons@Cornell
18%
UW
Madison
-‐
Minds@UW
17%
Texas
A&M
Repository
16%
IU
Scholarworks
13%
Columbia
Univ
-‐
Academic
13%
D-‐Scholarship@Pitt
12%
CaltechAuthors
10%
Univ
of
Rochester
Research
6%
UW
-‐
ResearchWorks
Archive
3%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
20. Survey
Methodology
Key
Points
u Selected
from
OpenDOAR
v Only
IRs
from
the
U.S.
n “Pure”
institutional
or
disciplinary
repositories
v Different
software
types
n DSpace,
Digital
Commons,
EPrints,
IR+,
CONTENTdm,
DigiTool,
arXiv
u Calculated
total
items
in
each
repository
u Site
operator
search
v Site:repositoryURL
v Shows
Approximation
22. Repository
so_ware
does
not
appear
to
be
the
deciding
factor
Repository
Name
Repository
So_ware
Repository
URL
Repository
items
Items
in
Google
Scholar
Indexing
Ra*o
Boston
College
-‐
eScholarship@BC
DigiTool
dcollec7ons.bc.edu
1,635
1
0%
UW
-‐
ResearchWorks
Archive
Dspace
digital.lib.washington.edu/dspace
11,285
304
3%
Univ
of
Rochester
Research
IR+
urresearch.rochester.edu
16,184
983
6%
CaltechAuthors
Eprints
authors.library.caltech.edu
22,000
2,290
10%
D-‐Scholarship@PiT
Eprints
d-‐scholarship.piT.edu
5,888
686
12%
Columbia
Univ
-‐
Academic
Commons
Digital
Commons
academiccommons.columbia.edu
4,631
586
13%
IU
Scholarworks
Dspace
scholarworks.iu.edu/dspace
7,782
1,030
13%
Texas
A&M
Repository
Dspace
repository.tamu.edu
46,324
7,250
16%
UW
Madison
-‐
Minds@UW
Dspace
minds.wisconsin.edu
15,078
2,520
17%
eCommons@Cornell
Dspace
ecommons.library.cornell.edu
18,544
3,410
18%
Harvard
Univ
-‐
DASH
Dspace
dash.harvard.edu
6,193
1,710
28%
Univ
of
Oregon
-‐
Scholars
Bank
Dspace
scholarsbank.uoregon.edu/xmlui
9,740
2,840
29%
Michigan
-‐
Deep
Blue
Dspace
deepblue.lib.umich.edu
66,038
22,200
34%
BYU
Scholars
Archive
CONTENTdm
scholarsarchive.lib.byu.edu
7,421
2,520
34%
IUPUI
Scholar
Dspace
scholarworks.iupui.edu
2,109
800
38%
Cornell
-‐
Digital
Commons@ILR
Digital
Commons
digitalcommons.ilr.cornell.edu
14,669
5,880
40%
Cornell
-‐
arXiv
Other
(arXiv)
arxiv.org
706,906
330,000
47%
Aqua7c
Commons
Eprints
aqua7ccommons.org
5,722
3,230
56%
Virginia
Tech
-‐
CS
Tech
Reports
Eprints
eprints.cs.vt.edu
983
586
60%
Digital
Commons@UNLincoln
Digital
Commons
digitalcommons.unl.edu
50,657
30,200
60%
Baylor
U
-‐
BearDocs
Dspace
beardocs.baylor.edu
928
829
89%
23. Google
Scholar
wants
the
right
metadata
tags
used
consistently
and
accurately.
"Use
Dublin
Core
tags
(e.g.,
DC.title)
as
a
last
resort
-‐they
work
poorly
for
journal
papers...”
-‐ Google
Scholar
Inclusion
Guidelines
for
Webmasters
…
there's
a
good
chance
that
many
of
your
papers
aren't
included
at
all,
because
documents
with
the
same
title
are
often
considered
duplicates.
-‐ Google
Scholar
Inclusion
Guidelines
for
Webmasters
“…
incorrect
identification
of
references
could
lead
to
exclusion
of
your
papers
from
Google
Scholar
or
to
low
ranking
of
your
papers
in
the
search
results.”
-‐ Google
Scholar
Inclusion
Guidelines
for
Webmasters
“…the
most
common
cause
of
indexing
problems
is
incorrect
extraction
of
bibliographic
data
by
the
automated
parser
software.
-‐
Google
Scholar
Inclusion
Guidelines
for
Webmasters
24. Challenge
is
presen*ng
bibliographic
cita*ons
GS
can
iden*fy,
parse
and
digest
10/31/11 Thanks for nothing: changes in income and labor force participation for never-married mothers since 1982
Title Thanks for nothing: changes in income and labor force participation for never-married mothers since 1982
University of Utah creator Wolfinger, Nicholas H.
Other Creator McKeever, Matthew
Subject.Keyword Motherhood; Single Mothers; Income; Population surveys;
Subject.LCSH Single mothers
Income
Description This study examines whether the changing social and economic characteristics of
women who give birth out of wedlock have led to higher family incomes. Using Current
Population Survey data collected between 1982 and 2002, we find that never-married
mothers remain poor. They have made modest economic gains, but these have disproportionately
occurred at the top of the income distribution. Yet there is no evidence of
a burgeoning class of "Murphy Browns" middle-class professional women who give
birth out of wedlock. Surprisingly, never-married mothers' incomes have stagnated in
spite of impressive gains in education and other personal and vocational characteristics
that should have resulted in greater economic progress than has been the case.
These gains cast doubt on various stereotypes about women who give birth out of
wedlock.
Publisher University of Utah
Date.Original 2006-07-26
Type Text
Format.Extent 370,155 Bytes
Format.Medium application/pdf
Resource Identifier ir-main,824
Language eng
Series Institute of Public and International Affairs Working Papers
Relation McKeever, M. & Wolfinger, N.H. (2006). Thanks for Nothing: Changes in Income and Labor Force Participation
Never-Married Mothers since 1982. Institute of Public & International Affairs (IPIA), 4, 1-43.
Rights Management (c) Matthew McKeever and Nicholas H. Wolfinger
Research Institute Institute of Public and International Affairs (IPIA)
Department Family & Consumer Studies
Sociology
School / College College of Social & Behavioral Science
Contributing Institution University of Utah
Publication Type working paper
UNIVERSITY OF UTAH | ECCLES HEALTH SCIENCES LIBRARY | MARRIOTT LIBRARY | QUINNEY LAW LIBRARY | DISCLAIMER | COPYRIGHT | CONTACT
IN ACCORDANCE WITH THE AMERICANS WITH DISABILITIES ACT, THE INFORMATION IN THIS SITE IS AVAILABLE IN ALTERNATE FORMATS UPON REQUEST.
25. First
step
was
to
begin
aligning
Highwire
Press
with
exis*ng
Dublin
Core
fields
27. Google
Scholar
Pilot
1
tested
importance
of
Metadata
model
u 6,482
URLs
in
Sitemaps
submitted
via
Google
Webmaster
Tools.
u Errors
generated
during
Google
crawls
were
analyzed
and
addressed.
u Updated
&
corrected
metadata
for
20
pilot
articles
v Ensured
full-‐text
PDF
met
GS
inclusion
guideline
requirements.
v Provided
a
“landing
page”
per
GS
inclusion
guidelines,
containing
links
to
the
20
IR
pilot
papers
that
was
within
a
few
clicks
of
the
home
page.
28. USpace
IR
Google
Index
Ra*os
increased
Google Index Ratio
12%
07/05/10
ETD
1
69%
11/19/10
97%
10/16/11
0%
ETD
2
68%
98%
23%
UScholar
Works
51%
98%
4%
Board
of
Regents
47%
97%
0%
25%
50%
75%
100%
*October 16, 2011 Weighted Average Google Index Ratio = 97.82% (10,306/10,536).
29. USpace
IR
Google
Index
Ra*os
increased
Google Index Ratio
07/05/10
Google Scholar Index Ratio
ETD
1
12%
69%
11/19/10
97%
0%
10/16/11
0%
ETD
2
68%
98%
23%
UScholar
Works
51%
98%
4%
Board
of
Regents
47%
97%
0%
25%
50%
75%
100%
*October 16, 2011 Weighted Average Google Index Ratio = 97.82% (10,306/10,536).
30. GS
Pilot
2
U*lized
OCLC’s
rela*onship
with
Google
Scholar
u 19
Papers
in
GS
Pilot
2
Google Scholar Index Ratio
v 6
of
7
GS
paper
types
represented
v 19
Full
Text
PDFs
62%
u Augmented
CONTENTdm
v.6
v Highwire
Press
Meta
tags
v Browse
By
Year
v Recently
Added
v College
&
Department
31. A
Pre-‐Print
Author
Manuscript
is
not
the
Journal
Ar*cle.
Meta
Tag
Pre-‐Print
Journal
Article
1
-‐
citation_author
Maloney,
Krisellen;
Antelman,
Kristin;
Maloney,
Krisellen;
Antelman,
Kristin;
Arlitsch,
Arlitsch,
Kenning;
Butler,
John
Kenning;
Butler,
John
2
-‐
citation_date
2009
2010
3
-‐
citation_title
Future
leaders'
views
on
organizational
Future
leaders'
views
on
organizational
culture
culture
4
-‐
citation_publisher
N/A
Association
of
College
&
Research
Libraries
5
-‐
citation_journal_title
N/A
College
and
Research
Libraries
6
-‐
citation_volume
71
7
-‐
citation_issue
4
8
-‐
citation_firstpage
1
322
9
-‐
citation_lastpage
56
347
10
-‐
citation_doi
11
-‐
citation_issn
12
-‐
citation_isbn
13
-‐
citation_keywords
Organizational
culture
Organizational
culture
16
-‐
citation_technical_report_institution
Uspace
Ins7tu7onal
Repository,
N/A
University
of
Utah
17
-‐
citation_technical_report_number
N/A
18
-‐
citation_language
en
en
21
-‐
citation_pdf_url
hTp://cdm6gs.lib.utah.edu/u7ls/geeile/ hTp://cdm6gs.lib.utah.edu/u7ls/geeile/collec7on/
collec7on/uspace/id/10/filename/3.pdf
uspace/id/16/filename/17.pdf
22
-‐
citation_abstract_html_url
hTp://cdm6gs.lib.utah.edu/cdm/singleitem/ hTp://cdm6gs.lib.utah.edu/cdm/singleitem/
Not Relevant collec7on/uspace/id/10/rec/1
collec7on/uspace/id/16/rec/2
14 - citation_dissertation_institution
15 - citation_dissertation_name
19 - citation_conference_title
20 - citation_inbook_title
32. A
minor
nuance
is
the
difference
between
Books
and
Book
Chapters
Meta
Tag
Book
Chapter
Book
1
-‐
citation_author
Riloff,
Ellen
M.
Ram,
Ashwin
2
-‐
citation_date
1999
1999
3
-‐
citation_title
Information
extraction
as
a
stepping
stone
toward
Understanding
Language:
Understanding
story
understanding
Computational
Models
of
Reading
4
-‐
citation_publisher
MIT
Press
MIT
Press
8
-‐
citation_firstpage
435
1
9
-‐
citation_lastpage
460
519
12
-‐
citation_isbn
0-‐262-‐18192-‐4
0-‐262-‐18192-‐4
13
-‐
citation_keywords
Information
extraction;
Story
understanding;
Information
extraction;
Story
understanding;
18
-‐
citation_language
en
en
20
-‐
citation_inbook_title
Understanding
Language:
Understanding
N/A
Computational
Models
of
Reading
21
-‐
citation_pdf_url
hTp://cdm6gs.lib.utah.edu/u7ls/geeile/collec7on/
uspace/id/9/filename/5.pdf
22
-‐
hTp://cdm6gs.lib.utah.edu/cdm/singleitem/collec7on/
citation_abstract_html_url
uspace/id/9/rec/1
Not Relevant
5 - citation_journal_title
6 - citation_volume
7 - citation_issue
10 - citation_doi
11 - citation_issn
14 - citation_dissertation_institution
15 - citation_dissertation_name
16 - citation_technical_report_institution
17 - citation_technical_report_number
19 - citation_conference_title
33. ETDs
use
very
different
metadata
tags
Meta
Tag
PhD
Masters
1
-‐
citation_author
Rague,
Brian
William
Wu,
Shangduan
2
-‐
citation_date
2010/08
2010/07
3
-‐
citation_title
A
CS1
pedagogical
approach
to
parallel
thinking
Electronic
structure
and
transport
property
of
disordered
graphene
8
-‐
citation_firstpage
1
1
9
-‐
citation_lastpage
234
84
13
-‐
citation_keywords
Computer;
CS1;
Educa7on;
Parallel;
Programming;
Disorder;
Electronic
structure;
Graphene;
Transport
property;
Electronic
structure;
14
-‐
citation_dissertation_institution
University
of
Utah,
College
of
Engineering
University
of
Utah,
College
of
Science
15
-‐
citation_dissertation_name
PhD
MS
18
-‐
citation_language
en
en
21
-‐
citation_pdf_url
hTp://cdm6gs.lib.utah.edu/u7ls/geeile/collec7on/ hTp://cdm6gs.lib.utah.edu/u7ls/geeile/collec7on/
uspace/id/5/filename/19.pdf
uspace/id/0/filename/4.pdf
22
-‐
citation_abstract_html_url
hTp://cdm6gs.lib.utah.edu/cdm/singleitem/ hTp://cdm6gs.lib.utah.edu/cdm/singleitem/collec7on/
collec7on/uspace/id/5/rec/1
uspace/id/0/rec/1
Not Relevant
4 - citation_publisher
5 - citation_journal_title
6 - citation_volume
7 - citation_issue
10 - citation_doi
11 - citation_issn
12 - citation_isbn
16 - citation_technical_report_institution
17 - citation_technical_report_number
19 - citation_conference_title
20 - citation_inbook_title
34. Working
papers
have
a
unique
combina*on
of
metadata
tags.
Meta
Tag
Working
Paper
1
-‐
citation_author
Wolfinger,
Nicholas
H.;
McKeever,
Matthew
2
-‐
citation_date
2006-‐07-‐26
3
-‐
citation_title
Thanks
for
nothing:
changes
in
income
and
labor
force
participation
for
never-‐married
mothers
since
1982
6
-‐
citation_volume
7
-‐
citation_issue
8
-‐
citation_firstpage
1
9
-‐
citation_lastpage
43
10
-‐
citation_doi
13
-‐
citation_keywords
Motherhood;
Single
Mothers;
Income;
Population
surveys;
16
-‐
citation_technical_report_institution
Institute
of
Public
&
International
Affairs
(IPIA),
University
of
Utah
17
-‐
citation_technical_report_number
2006-‐07-‐04
18
-‐
citation_language
en
19
-‐
citation_conference_title
101st
American
Sociological
Associa7on
(ASA)
Annual
Mee7ng;
2006
Aug
11-‐14;
Montreal,
Canada
21
-‐
citation_pdf_url
hTp://cdm6gs.lib.utah.edu/u7ls/geeile/collec7on/uspace/id/7/filename/21.pdf
22
-‐
citation_abstract_html_url
hTp://cdm6gs.lib.utah.edu/cdm/singleitem/collec7on/uspace/id/7/rec/1
Not Relevant
4 - citation_publisher
5 - citation_journal_title
11 - citation_issn
12 - citation_isbn
14 - citation_dissertation_institution
15 - citation_dissertation_name
20 - citation_inbook_title
35. Conferece
Ar*cles
may
or
may
not
have
published
proceedings
Meta
Tag
Conference
Article
1
-‐
citation_author
Balasubramonian,
Rajeev;
Awasthi,
Manu;
Sudan,
Kshitij;
Carter,
John
2
-‐
citation_date
2009/02/14
3
-‐
citation_title
Dynamic
hardware-‐assisted
software-‐controlled
page
placement
to
manage
capacity
allocation
and
sharing
within
large
caches
4
-‐
citation_publisher
Institute
of
Electrical
and
Electronics
Engineers
(IEEE)
5
-‐
citation_journal_title
High
Performance
Computer
Architecture,
2009.
HPCA
2009.
IEEE
15th
International
Symposium
on
6
-‐
citation_volume
7
-‐
citation_issue
8
-‐
citation_firstpage
250
9
-‐
citation_lastpage
261
10
-‐
citation_doi
10.1109/HPCA.2009.4798260
11
-‐
citation_issn
1530-‐0897
12
-‐
citation_isbn
978-‐1-‐4244-‐2932-‐5
13
-‐
citation_keywords
Page
coloring;
Shadow-‐memory
addresses;
Cache
capacity
allocation;
Data/page
migration
18
-‐
citation_language
en
19
-‐
citation_conference_title
15th
Interna7onal
Symposium
on
High
Performance
Computer
Architecture
(HPCA-‐15
2009)
[14-‐18
Feb.
2009,
Raleigh,
NC,
USA]
21
-‐
citation_pdf_url
hTp://cdm6gs.lib.utah.edu/u7ls/geeile/collec7on/uspace/id/1/filename/11.pdf
citation_abstract_html_url
hTp://cdm6gs.lib.utah.edu/cdm/ref/collec7on/uspace/id/1
22
-‐
Not Relevant
14 - citation_dissertation_institution
15 - citation_dissertation_name
16 - citation_technical_report_institution
17 - citation_technical_report_number
20 - citation_inbook_title
36. Ques*ons?
Kenning
Arlitsch
kenning.arlitsch@utah.edu
Patrick
OBrien
www.RevXcorp.com
Patrick.OBrien@utah.edu
805.509.2586
37. Ques*ons?
Kenning
Arlitsch
kenning.arlitsch@utah.edu
Patrick
OBrien
www.RevXcorp.com
Patrick.OBrien@utah.edu