Presentation on April 23, 2015 at the Amigos Library Services online conference: "Linked Data & RDF: New Frontiers in Metadata and Access"
Covers traditional SEO and Semantic Web Optimization, including Semantic Web Identity and a Schema.org project at Montana State University Library.
Aspirational Block Program Block Syaldey District - Almora
Walk Before You Run: Prerequisites to Linked Data
1. Walk Before You Run
Prerequisites to Linked Data
Kenning Arlitsch
Dean of the Library
@kenning_msu
2. Linked Data applications will not matter if search engines can’t
find library websites and repositories, crawl them, and
understand the metadata provided.
First, Take Care of Basics
3. Agenda
Traditional SEO (Search Engine Optimization)
– Hardware, software, websites, metadata
Semantic Web Optimization
– Semantic Identity
– Schema.org Project at MSU
• Using a vocabulary understood by search engines
• Improve machine comprehension
4. Funded Research
• 2011-2014
– “Getting Found: Search Engine Optimization for Digital
Repositories”
• 2014-2017
– “Measuring Up: Assessing Accuracy of Reported Use and Impact
of Digital Repositories
– Partners
• OCLC Research
• Association of Research Libraries
• University of New Mexico
8. DeRosa, Cathy, et al. “Perceptions of Libraries, 2010: Context and
Community: A Report to the OCLC Membership”, OCLC, 2010.
Where College Students Begin Research - 2010
10. Our Research Inspiration
• Decade building digital libraries - Univ of Utah
– Mountain West Digital Library
– Utah Digital Newspapers
– Western Waters Digital Library
– Western Soundscape Archive
• Were they being used…?
11. Uh, not really…
• 2010 situation at Utah
– 12% of digital collections indexed by Google
– 0.5% of Utah’s IR scholarly papers accessible via
Google Scholar
12. Basic SEO began producing significant increases in the
average number of page views per day…
Avg. Page Views / Day content.lib.utah.edu
13. Basic SEO improved Utah’s collection
accessibility in Google…
92%
79%
51%
12%
0% 25% 50% 75% 100%
Average
07/05/10 04/04/11 11/30/11 12/05/13
Google Index Ratio - All Collections*
* Google Index Ratio = URLs submitted / URLs Indexed by Google
** ~150 collections containing ~170,00 URLs (07/2010) and ~170 collections containing ~282,000 URLs (12/2013)
14. …resulting in more referrals and visitors
12 week comparison 2010 vs. 2012
15. Technical Barriers to SE Crawlers
• Website Design
– Graphics
– Confusing site hierarchies and paths
• Slow servers
• CMS often lack canonical links
• Metadata
– Schema not understood by SE
– Not unique
– Inconsistent/inaccurate
16. Nearly 100% USpace IR content indexed in Google
Google Index Ratio
97%
98%
98%
97%
47%
51%
68%
69%
4%
23%
0%
12%
0% 25% 50% 75% 100%
Board of Regents
UScholar Works
ETD 2
ETD 107/05/10
11/19/10
10/16/11
Google Scholar Index Ratio
~0%*October 16, 2011 Weighted Average Google Index Ratio = 97.82% (10,306/10,536).
17. Challenge is presenting structured data SE’s
can identify, parse and digest
Wolfinger, N. H., & McKeever, M. (2006, July). Thanks for nothing: changes in income and
labor force participation for never-married mothers since 1982. In 101st American
Sociological Association (ASA) Annual Meeting; 2006 Aug 11-14; Montreal, Canada (No.
2006-07-04, pp. 1-42). Institute of Public & International Affairs (IPIA), University of
Utah.
Human Readable
Google Scholar
Understandable
19. SEO Organizational/Cultural Themes
• Traditional SEO is an afterthought
• Librarians think too small re potential traffic
• Organizational communication is poor
• Analytics are usually poorly implemented
• Vendors are slow to catch on to SEO problems
– Because we don’t demand it
20. Recommended SEO Process
1. Institutionalize SEO
● Strategic Planning
● Accurate Measurement Tools
2. Traditional SEO
● Get Indexed = Index Ratio
● Get Visible = Search Engine Results Page (SERP)
21. Advanced SEO Programs
3. Semantic SEO
● Get Relevant = Click Through Ratios (CTR)
● Semantic Identity
● Schema.org for Libraries
● Linked Open Data (LOD)
4. Social Media Optimization
● Faculty Outreach
23. Current Situation
Academic organizations are poorly represented on the
Semantic Web…
…because search engines don’t understand them…
…because we don’t maintain the data sources
search engines trust.
24. Affects reputation of the entire
academic institution
Colleges
Departments Centers Institutes
26. Google’s Knowledge Graph
The Web is moving from “strings” to “things”
“A knowledge base … to enhance search results with
semantic-search information gathered from a wide
variety of sources”
Source: Wikipedia
27. Knowledge Graph Products
• Answer Box
– Facts about concepts
• Carousel
– Group of instances that comprise a concept
• Knowledge Card
– Displays information about organizations and people
28.
29.
30. Lack of a Knowledge Card in search results is indicative
of a larger problem…
…and as a result Google is unlikely to connect
users with the organization’s website
…it means Google doesn’t understand that the
organization exists or what its business is…
31. Survey of ARL Libraries
• n=125
• Searched by name listed in ARL directory
• Knowledge Card? Yes/No
• Robustness scale of 1-5
32.
33.
34.
35. Survey of ARL Libraries
No Knowledge Card at all
43
Have Knowledge Card
82
-10 incorrect
-29 (robustness of 1)
Total = 43
39. Trusted Sources for Search Engines
• No Wikipedia presence?
– Organization doesn’t exist as an “entity” or “thing”
– It exists as a string of (confusing) text
• Other influences on Google’s Knowledge Graph
– FreeBase (phasing out in favor of Wikidata)
– Google Places/Google My Business
– Google+
54. Summary
• Define library organization in Wikipedia
– Beware of *pedia culture and process
• Engage with other trusted data sources
– Wikidata
– Google Places/Google My Business
– Google+
• Mark-up metadata with Schema.org
55. New Knowledge Work for Libraries
• Build set of replicable services
– Populate and maintain structured data records
– Add rich semantic markup to websites
• Communicate
– Understand ourselves from stakeholder
perspective
– Machine-understandable information
57. Schema.org
• Common vocabulary for describing things on web
• Supported by Bing, Google, Yahoo and Yandex
• “On-page markup helps search engines
understand the information on webpages and
provide richer results.”
• https://support.google.com/webmasters/answer/1
211158?hl=en
57
58. Hypothesis
• Implementing Schema.org in library websites
– Improves machine understanding of content
– Improves rich snippets shown in SERP
– Increases click-through rates from SERP
• Result
– More traffic
– More users finding what they’re looking for
59. Project: A Controlled Experiment
by Jason Clark (with Michelle Gollehon)
• Two digital collections
• Similar size/content/date range
– Photos and historical documents
• 1 optimized with Schema.org (Schultz)
• 1 control (Brook)
60. A Revised Digital Library Architecture
• Collection Page (home page)
– arc.lib.montana.edu/schultz-0010/
• About Pages (about page, topics page)
– arc.lib.montana.edu/schultz-0010/about.php
• Item Pages (individual record page)
– arc.lib.montana.edu/schultz-0010/item/31
• Sitemap and rel=canonical work
– arc.lib.montana.edu/schultz-0010/
64. Semantic Web Team
• Kenning Arlitsch, Dean @kenning_msu
• Patrick OBrien, Semantic Web Director @sempob
• Jeff Mixter, Research Associate, OCLC Research
• Jason Clark, Head of Lib Informatics and Computing @jaclark
• Scott Young, Digital Initiatives Librarian @hei_scott
• Doralyn Rossmann, Head of Coll Development @doralyn
• Jean Godby, Senior Research Associate, OCLC Research
65. Relevant Publications
• Arlitsch, Kenning, and Patrick S. OBrien. (2013) Improving the visibility and use of digital repositories through
SEO. Chicago: ALA TechSource. ISBN-13: 978-1-55570-906-8
• Mixter, Jeff, Patrick OBrien and Kenning Arlitsch. “Describing Theses and Dissertations using Schema.org,”
Proceedings of the International Conference on Dublin Core and Metadata Applications 2014,Dublin Core
Metadata Initiative: 138-146.
• Arlitsch, Kenning. “Being Irrelevant: How Library Data Interchange Standards have kept us off the Internet,”
Journal of Library Administration, 54, no. 7 (2014): 609-619.
• Arlitsch, Kenning, Patrick OBrien, Jason A. Clark, Scott W.H. Young and Doralyn Rossmann. “Demonstrating
Library Value at Network Scale: Leveraging the Semantic Web with New Knowledge Work,”Journal of Library
Administration, 54, no. 5 (2014): 413-425.
• Arlitsch, Kenning, Patrick OBrien, and Brian Rossmann. "Managing Search Engine Optimization: An Introduction
for Library Administrators." Journal of Library Administration 53, no. 2-3 (2013): 177-188.
• Arlitsch, Kenning, and Patrick S. O'Brien. "Invisible institutional repositories: Addressing the low indexing ratios
of IRs in Google Scholar." Library Hi Tech 30, no. 1 (2012): 60-81.