3. Who’s This Guy?
Keith Schengili-Roberts,
IXIASOFT DITA Specialist
What I do:
• DITA evangelist
• Liaison with OASIS; on DITA
Adoption and Technical Committees
• Industry researcher
• Lecturer on Information Architecture,
University of Toronto
• 10+ Years of DITA XML experience
4. Also Known As “DITAWriter”
• Industry blog started +5
years ago
• Just over 200,000 hits
• Regularly updated info on:
DITA Conferences
DITA Books
Companies Using DITA
DITA CMSes
DITA Editors
Other DITA Tools
DITA Consulting Firms
• News/views on DITA use
• Features interviews with
those making a difference
in the world of DITA
5. Scope of this Presentation
• HTML-based output from DITA content
• Mechanisms available in DITA to aid with SEO
• Information on what Google is looking for when it
ranks content
• Writing DITA content with better SEO in mind
• Along the way I may burst a few bubbles when it
comes to what techniques do and do not matter
6. Q: WHERE’S THE BEST PLACE
TO HIDE A DEAD BODY?
A: The second page of Google
7. Search Engine Optimization is Magic!
• No, it really isn’t
• There are agencies that
can help with SEO, but
the information is out
there and available
• Recommend Google
Webmasters as a start:
www.google.com/webmasters/
8. How Search Engines Work
• Three key phases:
1. Your content is “crawled”
by a search engine
spider; finds new/changed
info and retrieves it
2. Search engine analyses
and indexes your
website’s content
3. A user submits a query to
a search engine,
providing a list of possible
links
9. How DITA Content is Produced for the Web
1. DITA content is crafted by
writers
2. Content is transformed from
DITA via XSL (typically through
the DITA Open Toolkit) to
XHTML
3. This transformed content is then
placed on the Web
There are steps at each of these
stages that can help improve SEO
10. Serving DITA Content on a Web Server
• Content can be served on
platforms optimized for DITA;
examples include:
Congility DITAweb
Antidot FluidTopics
Zoomin Docs
• These DITA-specific web
platforms come with tools
designed to help your
customers find your content
once they are at your
website
11. Acrolinx Scorecard and SEO
• Acrolinx includes an SEO
rating in their “Scorecard
Summary”
• Works by having user
enter keywords, then the
plug-in analyzes the
related keyword usage in
the document
• Full report advises on
keyword usage in title,
short description,
document body, meta
description, etc.
12. Do You Want to Even Make Your Docs Visible?
• Some companies opt not to have their
documentation “spider-able”:
Company wants search engines to focus
exclusively on marketing content
When there’s a need to point to a
company-sponsored search engine
specifically for docs
• In your webpages, add the following to
each header:
All search engines:
<meta name="robots" content="noindex"/>
Google only:
<meta name="googlebot" content="noindex"/>
• Or, add a robots.txt file to your webserver that
says the following:
User-agent: *
Disallow: /tech-docs/
13. Food for (Web) Spiders: sitemap.xml
• You can aid the search engine crawlers
coming to your documentation by
creating a sitemap.xml file for that
describes the following:
Parent URL for Website content
URL of specific page
Date that webpage was last updated
(optional)
How frequently the page is likely to change
(optional)
Priority of a given page in comparison to other
pages on the website (optional)
14. Sample sitemap.xml File
• Sample sitemap.xml for
DITAWriter website
• “Priority” value ranges
from 0.0 to 1.0, with
default set to 0.5
use this to increase
likelihood of your most
important pages being
present in a search index
• Upload your sitemap
file to peer directory/
“starting point”
15. DITA and Metadata
• DITA can be incorporated at
both the map and topic
levels
Bookmaps use the bookmeta
and topicmeta elements as
containers
Topics incorporate metadata
within the prolog
• This content along is then
expressed at output
primarily as Dublin Core
metadata
Bookmap
Topic
16. What is Dublin Core?
• Dublin Core is a set of
metadata designed to
describe web content,
related to semantic web
initiative
• Originating in 1995, since
mid 2000s DCMI have
worked with W3C on
Semantic Web efforts
• DITA-OT uses a subset of
Simple Dublin Core v1.1
when outputting to XHTML
17. DITA, DITA-OT and Dublin Core
DITA Element(s) Dublin Core Equivalent
author (topic), authorinformation (map) Creator
category Coverage
[output type: XHTML] Format
critdates Date
[id value associated with topic type] Identifier
publisher (topic), publisherinformation
(map)
Publisher
copyright Rights
source Source
keyword Subject
title Title
[topic type] Type
19. Google and Dublin Core
• While Dublin Core is long-
established, and the DITA-
OT supports it, Google does
not appear to do much with
this content
• It can be advantageous from
a content management
perspective
For example, info on when a
topic is created and by whom
may be useful to know
Local webserver may be able
to filter content on DC values
+
=
20. What About Keywords and Google?
• Forget it, no point (at least from an SEO perspective)
DITA topic
Equivalent XHTML output
21. SO WHAT IS IMPORTANT TO
GOOGLE?
A: The second page of Google
22. Making <title> Count
• Avoid boilerplate titles (i.e. “Introduction”); make
them descriptive (i.e. “What You Need to Know
About the Vebulon 5”)
• Make them concise; Google truncates long titles
that are just over 70-75 characters long
• Don’t overload them with keywords (i.e. “All About
the Vebulon 5 – Vebulon Five, Fifth Vebulon,
vebulon five, 5th vebulon, vebulon the fifth, Acme
Corporation’s Vebulon Five”)
23. So What Else Does Google Look At?
• Short Descriptions! Displayed immediately after title:
DITA
XHTML
Google
Search
24. Short Descriptions and Click-throughs
• While short descriptions are not factored in search
engine rankings, user behaviours are
• Google measures click-through rates (CTR)
• A well-written, descriptive short description
ensures more click-throughs
25. Links and Relationship Tables
• One metric thought to influence webpage rankings
are the number of links to a page
• More weight is applied from external URLs
pointing to a webpage than internal ones, but
internal hierarchy counts as well
• Adding relationship tables is not only good DITA
practice, but may also enhance SEO too!
26. Relationship Tables = Double the Output
• Relationship tables
results in Dublin
Core metadata as
well as links
DITA XHTML Header
XHTML
Body
27. Writing Effective Short Descriptions for SEO
• A well-written short description tells the would-be reader
why it is worth clicking on
Task: tell users what they can accomplish
Concept: tell users about what you are describing and why
they should care
Reference: tell users what the referenced item does or what
it can be used for
Troubleshooting: describe the symptoms of a problem a user
may encounter and let them know that this topic can help
• While shortdesc best practices suggests two sentences,
Google truncates search results at ~156 characters
Need to put most important content first!
28. Schema.org and SEO
• Sponsored by Google, Bing, Yandex and Yahoo!
to “create and support a common set of schemas
for structured data markup on web pages.”
• Its vocabulary is designed for marking up content
with semantic descriptions aimed at web spiders
• Uses Microdata, RDFa, or JSON-LD formats
Sample Schema.org Code Rendered in RDFa
29. • Current Schema.org definitions are not focused on
technical documentation, mainly on products
Most common usage is for “Rich Snippets”, describing
info about a product
• There are tools that can help combine RDFa with
XHTML output from DITA
But currently no RDFa/Schema.org implementation
DITA and Schema.org?
30. CRAFTING KILLER SEO CONTENT
WRITING FOR YOUR USERS
A: The second page of Google
31. A Story…
Kenmore Model 80
Clothes Washer
One day it stopped in the
middle of a wash, and
wouldn’t drain…
33. So What About the Clothes Washer Manual?
• Continued search to see if
manual turned up; it didn’t
• Did a different search
specifically for the manual, then
looked for info on my problem
• Problem was there, correct
solution (in this case) was not
34. Writing to Engage with Your Audience
• Previous example underscores how important it is
to anticipate users’ needs
If the information is improperly targeted, is not well-
described or is missing, users will not find it
• Know your users!
Why they have come to your content?
What are they seeking to accomplish?
• This is why having effective personas + scenarios
to help guide your technical writers is a priority
35. How DITA Can Help Shape the Dialogue
• DITA’s topic types set the stage for how technical
writers communicate with their audience
Concept: what is this thing and what is it for?
Reference: what are the correct settings?
Task: how do I accomplish this procedure?
Troubleshooting: how do I fix this problem I am having?
36. Search Engine Technology is Changing
• It used to be that anyone who knew basics of
Boolean searches (AND, NOT, OR) could expect
to get better search results
• Google has invested significantly in natural
language speech recognition (Google Now)
37. What To Know About Voice Query Usage
• While youngest demographic uses voice queries
the most, rates are also high with adult
demographics
Voice query usage is growing rapidly
• Voice query length longer, typically phrased as a
question
Text queries average 2-3 words, voice 3-5 words
• Voice queries tend to be goal-directed
“Grocery stores near me”
“How do I fix my clothes washer?”
38. Implication of Voice Queries for Tech Docs
• Further emphasizes focus on needs of the user
Think about why they have come to read your docs
What are likely scenarios that have led them to your
docs?
What questions will they ask, and how can you
answer them?
Consider writing titles or short descriptions as possible answers
to a query
If you haven’t already adopted DITA 1.3 troubleshooting topic
type, consider doing so
39. DITA + SEO: Summing Up
• Optimize content for search engines and users
• Consider adding sitemap.xml to help spiders find and
index your content
• Understand that Dublin Core is present in DITA-OT
• Descriptive, concise titles!
• Effective short descriptions can increase CTR
• Relationship tables may also help
• Keep an eye for future developments from
Schema.org
• Do not think in terms of SEO “tricks”; best thing you
can do is to know your audience and write for them
40. Further Reading
• Google Webmasters: google.com/webmasters/
Meta tags that Google understands:
support.google.com/webmasters/answer/79812?hl=en
• Sitemaps.org: sitemaps.org/
• Dublin Core Metadata Initiative: dublincore.org/
• SEO Pressor Connect Blog: seopressor.com/blog/
• Moz Blog: moz.com/blog
• OASIS Feature article on short descriptions (PDF):
oasis-open.org/committees/download.php/57803/
Have also contributed to OASIS DITA Adoption Committee articles on DITA 1.3
Not planning to looking specifically at Yahoo! or Bing; according to statista Google makes up roughly 90% of all searches (http://www.statista.com/statistics/216573/worldwide-market-share-of-search-engines/)
Okay, let’s get the old joke out of the way. ;)
One of the goals of this presentation is to provide you with the information you need to know in order to better optimize your content for search engines (like Google)
Much of the information at first glance may seem to be conflicting, and over time some things have changed, but good information is available; one of the better sources is from Google themselves
I’ll talk about some of this in more detail later, but for now understand that this is the basic process.
…and I plan to talk about each of these in turn, in reverse order
There are other tools out there (these are some that I am familiar with from my experience at IXIASOFT; these are all partner products)
One example I am aware of: DITAweb implements facetted search, which uses metadata/Subject Scheme info inserted into your DITA topics to help users narrow down the scope of their search
Ensures that things like keyword density, frequency, etc. are optimized; note “keywords” in this context does not necessarily equate to metadata
Info from: https://support.acrolinx.com/hc/en-us/articles/205617711-Checking-Documents-for-SEO and https://support.acrolinx.com/hc/en-us/articles/205617881-The-Acrolinx-SEO-Report
Google Webmasters has good info on adding “noindex” metadata to HTML pages here: https://support.google.com/webmasters/answer/93710?rd=1
For good info on robots.txt file, see: http://www.robotstxt.org/
robots.txt file example code basically says: all search engine spiders should not index content in the /tech-docs/ sub-directory (replace with name of actual directory for your case)
There are programs that can automatically create this file for you, which you can then tweak
Full information can be found on the http://www.sitemaps.org/ website
Note that priority is only related to other pages contained within your own website, use judiciously
If you set a high priority to everything the priority value will be discounted/averaged out
By “starting point” I mean that you should place the file from the top-most level for your documentation. You can have multiple sitemap files, and you can even have a top-most sitemap file that references all of the other sitemaps for a given website (useful for very large websites)
Named after Dublin, Ohio btw ;)
“W3C” – “World Wide Web Consortium”
Am not aware of this info previously made available in this compact format
The DITA example is designed to output as many possible Dublin Core metatags (and variants) as possible
The same applies to the keyword element applied within body text
Again, it may be of some use outside of SEO
While image is from 2009, this was recently re-confirmed by Google Webmasters
Titles
Short descriptions
Relationship tables
Writing for your users
~70-75 characters for Title length is very new; expanded last week from previous 55-60 characters
Also, note that <meta name=“description”… is not Dublin Core
The image shows is Google’s Search Console, which allows registered websites to delve into their site stats and what Google measures; “CTR” = click through rates
Look for a recent white paper I co-wrote with Joe Storbeck of JANA on the DITA Adoption section of the OASIS website for more info on Short Descriptions
http://seopressor.com/blog/dublin-core-vs-schemaorg-metadata-comparison/
Yandex is Russia’s largest search engine
RDFa = “Resource Description Framework in Attributes”
Many SEO-oriented websites talk about how to “trick” the likes of Google (which, let’s face it, is probably the stupidest thing you can do), but from a technical writing perspective I would argue that it is really all about knowing your audience, being honest with them, and ensuring that you deliver the content that they need when and where they need it
What made me first think of this subject (SEO and tech docs in general, then DITA) was an incident that happened a couple of years ago when our clothes washer was in the middle of washing a load of clothes and stopped, still filled with water. It happened late on a Sunday afternoon, so there was little to no chance of calling a repairman who could come and take a look at it immediately. We have two teenage girls and let’s just say we couldn’t be without a working clothes washer for long…
So I went to Google and typed “Kenmore series 80 not draining”. What came up first were some YouTube videos followed by handy tips provided by both users and professionals. In the end, the video provided me with all of the information I needed to know: a busted lid switch was the problem. I used the info from the video to jury-rig it, and get it to work. Then found the replacement part online and later repaired it. Also, note the number of views this video has received (over 200K)!
Not familiar with this function? If your computer has a microphone, turn it on, go to Google and click on the microphone icon in the search field.
People will also be familiar with Siri on Apple iPhone as another example of this, as is Cortana for Windows
Btw: I find that few of these systems understand the word “DITA” and usually interpret it as “DATA”. ;)