Is DITA/XML in your future? Have you heard rumors of an impending CMS? Or do you suspect these tools will be in your future sooner or later?
Two veterans, Steve Jong and Anna Pratt, have moved from FrameMaker/Word/RoboHelp to XMetaL and, variously, Vasont CMS, Perforce, and Microsoft Team Foundation Server CMS. Steve and Anna describe the transition they underwent, expose the love-hate relationship you’ll develop with DITA, and share their insights about how to survive and thrive in an XML world.
2. Overview
1. Concepts: the differences between desktop
publishing and structured environments
2. Our Stories: converting existing material to a
structured environment
3. Show and Tell: comparing and contrasting our work
environments and processes
4. Survey Says: How Anna’s group feels about the
change
5. Summary: How you too can survive—and even
thrive! —in an XML world
6. Demonstration: making a change to a topic
10/15/2015Surviving Thriving in an XML World
3. What is Structured
Information?
Structured information is information that has been
analyzed, categorized by type, and organized
Compare with data typing
Intellectual-property implementation: Information
Mapping®
Topics: concepts, tasks, references
Example: a step is part of a task (but not a concept)
Structure is separated from format
Commercial tools now support structured information
10/15/2015Surviving Thriving in an XML World
4. What is DITA XML?
Darwin Information Type Architecture
eXtensible Markup Language (widely used)
One implementation of Structured General
Markup Language (SGML)
10/15/2015Surviving Thriving in an XML World
5. DITA Advantages
Implements structured information
Readily supports reuse and reorganization
True single sourcing
CMS library uses database technology to support a secure,
controlled, multi-user environment
Supports large, highly complex documents well
Supports conditionalization well
Integrated, automated workflow (where used)
The bigger the group, the bigger the library, the more topics are
shared, the more translation done, the more channels… the
better the case for DITA
10/15/2015Surviving Thriving in an XML World
6. DITA Disadvantages
Rigidly enforces both
organization and structure
Very little control over format
Separation of steps in
create/edit/produce process
Lots of do-it-yourself or open-
source pieces
Requires dedicated support
personnel for even small
changes
Attention to detail by all team
members is critical—one
rogue actor can do real
damage
10/15/2015Surviving Thriving in an XML World
7. To quote a wise man:
“Here’s my number.
My company can help you when you’re
ready to convert back…”
8. Anna’s Awesome
Adventure
Into the wild world of DITA and XML
Established company, using FrameMaker, Word,
AuthorIT
Acquired by a larger company with dispersed doc
teams, using FrameMaker, Word, RoboHelp,
DocBook
Tech Comm team researched DITA and CMS system
options and chose XMetaL and DITA as production
tools
Consulting firm converted inaugural document set
Dispersed teams converted remaining docs
9. DITA + CMS =
Recently put a CMS in place (Team Foundation
Server-TFS)
Migrated 90% of projects to CMS (based on
product build systems)
Kept the other 10% in another source control
system
11. Favorite? Mistakes
Forgot to add topics to source control
Referenced a deleted topic
Worked with transforms (or not)
12. Steve’s Stirring Story
Startup company, using FrameMaker/Word, Acrobat
Pro
No online help
Acquired by larger company with a five-year-old
DITA/CMS environment in place (and the scars to
prove it)
New tools: Vasont CMS, XMetaL editor, Apache FOP
Existing documents converted into XML by a third-
party vendor
Online help, but no translation
10/15/2015Surviving Thriving in an XML World
13. What Worked Well (Steve)
Vendor successfully
converted 6 books/1000
pages
Vasont and home-grown
scripts incorporate many
process steps
Main book has 3 modes;
once converted, 1 click
generates any of 3 output
formats (including Help)
20 writers maintain a
CMS library of 700 books,
50,000 topics
10/15/2015Surviving Thriving in an XML World
14. What Didn’t Work Well (Steve)
Neither seamless nor
transparent
Considerable manual cleanup
(that I couldn’t let go)
One topic was 100 pages,
formatted as a glossary;
subsequently split (manually)
into ~100 reference topics
The new work environment
uses remote connection; can’t
copy/paste from spec to topic
Generating output can take 30
minutes
10/15/2015Surviving Thriving in an XML World
17. Works Well
All teams use the same tools
Documented DITA standards and processes put
every team on the same boat… (which only feels
like it’s sinking)
Reusability is much easier
Conditional text expands content capabilities
Content tagging for localization
Agile Development Environment
18. Pain Points
Transition process: getting content from the source tool
to XML tool
Identifying topics to share
Choosing the topic type
Shortcuts
Content tagging for localization
Finding information in the source
19. Working From a Distance
10/15/2015Surviving Thriving in an XML World
22. Structure Begins in Your
Mind
Focus narrowly
Look for concepts, tasks,
and references (and
name them that way)
Keep everything simple
Look for opportunities to
share
Stick to the style guide!
Don’t repeat—reuse
… But don’t push sharing
too far
Don’t use formatting to
indicate structure
Don’t mix concept, task,
and reference material in
a topic
Don’t write lists as
sentences (and learn to
accept one-item lists)
10/15/2015Surviving Thriving in an XML World
23. Converting Existing
Documents
Format is not structure
Don’t expect too much from
automated conversion
Blended material (concept +
task + reference) can’t be
categorized, so you will have
to rewrite it
10/15/2015Surviving Thriving in an XML World
Decide on the workflow and
file structure/hierarchy
Plan your transition schedule
Budget time and money for
ongoing maintenance of your
DITA system
Plan to have someone on
staff familiar with XMLT, DITA
Get buy-in from all writing
teams
Make everyone aware of the
benefits and the limitations of
DITA
24. Surviving
Let it go…
Ownership
Format control
Page breaks
Heading/footer navigation for the user
“Keep with next”
Tables with no continued titles
Quality control
Personal preferences and opinions (even if you’re
right!)
25. To quote a wise woman
“…although our local team had some 150 years of
collective tech writing experience, our DITA
deliverables reflect intern-level quality at best.”
26. Thriving
Become familiar with DITA, XML, and .bat files
Figure out the best way to use ditavals and
variables
Learn to love F11 to go from this:
To this:
27. Thrive on…
Discover the joy of reading log files and code
NEVER lose your sense of humor!
10/15/2015Surviving Thriving in an XML World
28. Demonstration
Making a single change to a topic
1. Extract topic from CMS library
2. Edit topic
3. Upload topic into CMS library
4. Render document (as PDF)
10/15/2015Surviving Thriving in an XML World
35. Edit: Raw View
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE task PUBLIC "-//OASIS//DTD DITA Task//EN" "http://cmsrender01/dtd/task.dtd" [
<!ENTITY cbl "cable”>
<!ENTITY wln "wireline”>
<!ENTITY wls "wireless”>
]>
<?VasontExtractInfo entity_id="4164257" version_id="11883"?><task
id="t_cmp_wireless_usr_managing_network_elements_creating_a_network_element_group"><title
id="v4000867">Creating a Network Element Group</title><prolog id="v5356143"><metadata id="v5356144"><keywords
id="v4000871"></keywords></metadata></prolog><taskbody id="v4000872"><context id="v4000873">To create a
network element group:</context><steps id="v4000876"><step id="v4000877"><cmd id="v4000878">From the <uicontrol
id="v4301940">Network</uicontrol> section of the navigation pane, select <uicontrol id="v4000879">Network
Elements</uicontrol>.</cmd><stepresult id="v4256868">The content tree displays a list of network element groups; the
initial group is <uicontrol id="v4000881">ALL</uicontrol>.</stepresult></step><step id="v4000882"><cmd
id="v4000883">From the content tree, select the <uicontrol id="v4000884">ALL</uicontrol> group.</cmd><stepresult
id="v4256870">The Network Element Administration page opens in the work area.</stepresult></step><step
id="v4000887"><cmd id="v4000888">On the Network Element Administration page, click <uicontrol
id="v4000890">Create Group</uicontrol>.</cmd><stepresult id="v4256872">The Create Group page
opens.</stepresult></step><step id="v4000892"><cmd id="v4000893">Enter the name of the new network element
group.</cmd><info id="v5040534"><ph id="v5040535" otherprops="pr_215419"> The name can be up to 255 characters
long and must not contain quotation marks (") or commas (,).</ph></info></step><step
id="step_5C4171AF06904E3D8FE7C244CD094C8D"><cmd id="v4286870">Enter a text description of the network
group.</cmd></step><step id="v4000896"><cmd id="v4000897">When you finish, click <uicontrol
id="v4000898">Save</uicontrol> (or <uicontrol id="v4000899">Cancel</uicontrol> to discard your
changes).</cmd><stepresult id="v5040536">The new group appears in the content
tree.</stepresult></step></steps><result id="result_8E82944CCCEB461C96873A9B38F8E599">You have created a
network element group.</result></taskbody></task>
38. References
DITA Users (dita-users) Yahoo! group
“Structured Information: Navigation, Access, and
Control.” Steven J. DeRose, 1995
http://sunsite.berkeley.edu/FindingAids/EAD/derose.h
tml (retrieved 15 Sep 2013)
“Crash Course for Content Management.” Vasont
Systems
https://www.vasont.com/resources/resources-crash-
course-for-content-management.html (retrieved 11
Nov 2013)
DITA Best Practices: A Roadmap for Writing, Editing,
and Architecting in DITA. Laura Bellamy, Michelle
Carey, Jenifer Schlotfeldt. IBM Press, 2011.
10/15/2015Surviving Thriving in an XML World
We each have over 30 years of experience, going back to #2 computer company (Honeywell)
We’ve each used all the tools
We each were using FrameMaker, and now we each use DITA XML
When we got together to compare notes, we discovered we’re using DITA in some very different ways
Our presentation is stronger for our discovered differences
We are both productive in the new environment
A pessimistic approach would be to cross out “thriving,” but we’ll take the optimistic view
What is structured information?
Information that has been analyzed, typed, and organized
I am a long-time advocate of structured documentation, before there were supporting tools
Information Mapping has seven map (topic) types: procedure, process, principle, concept, fact, structure, classification
Three types are widely used: concept, task, reference
Key point: As data can be typed (string, integer, floating-point), so can information be typed
Certain information types and elements go together; for example, a task can have a step (but a concept cannot)
Key point: Structure is separated from format
Is it bold because it’s a level head, or a level head because it’s bolded?
Is it a cross-reference because it’s blue and underlined, or blue and underlined because it’s a cross-reference?
Concept: What is DITA XML?
Darwin Information Type Architecture, developed by IBM and now open source; an end-to-end structure that uses a subset of the IM topic types
eXtensible Markup Language, used very widely in programming to exchange data between systems
One implementation of SGML; there are others (who here is familiar with VAX DOCUMENT?)
Very flexible, but not bandwidth efficient
Deep and complex environment, with multiple ways to do things
We are both using the full, strict DITA environment
(How widespread is XML? Who here uses XML? Who uses Word? Who uses Word .DOCX format? You’re using XML!)
Implements structured documentation (which Steve has advocated for many years)
Ready reuse—every element (paragraphs, steps, notes, art, topics) available for sharing and reuse
Can reorganize and move from level to level
True single sourcing (one-click rendering to multiple output formats)
CMS library uses DB technology to implement controlled multi-user environment
Steve’s environment ties changes to PR database, allowing end-to-end tracking (PITA, but I love it!)
Supports large, highly complex documents well (doesn't crash like some programs--"Word"!)
Conditionalization works well (doesn't crash like Frame)
Integrated, automated workflow (but Steve’s group doesn't use it)
DITA is rigid--enforces discipline in both organization and structure--no more freelancing, no Word/Frame overrides
(Imagine writing poetry where the tool enforces rhyming with a drop-down list of acceptable words: “I wandered lonely as a cloud” -> crowd, loud, proud, shroud ...)
Very little control over format
Separation of steps in create/edit/produce process
Not an integrated development environment (but desktop era was an anomaly)
Graphics tools aren’t integrated as in Word/Frame, but you can create externally and link in (which works well)
Compilation errors are possible
Lots of do-it-yourself or open-source pieces, strung together using scripts of .BAT files (Frame+WebWorks+Acrobat is proprietary, but smooth)
Overhead of dedicated support personnel (Steve’s group has 2 people, but better than DEC's 1:1 ratio)
Since the whole point is to share and reuse topics, teamwork is critical
In our environment it’s required
Unlike the Word or Frame world, file naming and location is critical (otherwise you can’t find topics in database or book won’t render)
Lack of attention to detail is glaring
Lone-wolf writers can do real damage
It’s a constant struggle against entropy
THREE POINTS:
Company who acquired us ended up with 3 dispersed teams, each using a different set of tools
Doc lead:
Directed to make all the docs/help look like they belong to one company
Put together a team to research CMS and DITA options (decision: no CMS due to $$; XMetaL and DITA Open Toolkit for processing)
Vendor performed an inaugural doc set conversion and provided us with
Transforms
XMetaL attribute/elements to use
Converted XML files
THREE POINTS:
Company adopted a CMS for all products
Migrated bugs, product code, docs, feature requests, and more to the CMS
Good news:
many of our docs now build with the nightly product builds which make doc deliverables available to all
because we use the same CMS, all of the dispersed teams we can access other Doc Team files if needed
THREE POINTS:
As Steve mentioned, to ensure that links work, shared files are available for access, output succeeds—hierarchies are critical
In some CMSes, like the ones that our company uses, if you fail to check something in, builds will not work
Because we access so many different pieces of software in the new workflows, we have many more possible points of failure
THREE POINTS:
If your new material does not appear in the doc that you just built, if might not be checked into source control.
If your docs build as part of the product build, you now have the power to break not only the doc builds, but also the product builds. You will meet engineers you never knew existed as they let you know that you BROKE the build.
Transforms:
Leaning the limitations of the open-source software
Even if your transforms work on your own system, you still need to test with every build system that the group uses
First output run:
Point to source folder. Done.
Point to output folder. Made same as source folder. Done.
Press Enter. Done. Message: couldn’t find source.
In this world, I can get out of my own way
People in Steve’s group (North Carolina) resigned when DITA came in because it was such a different way of working
The scars were still there when we got converted
When Steve’s local group converted, we were floundering
Steve’s group used a third-party conversion company (don't name) to convert our FrameMaker files into XML topics
(Who here knows who this is? Freddy Mercury of Queen meme)
(Who here knows who drew this? His initials are in the drawing. Sidney Paget, for Sherlock Holmes, 150 years earlier)
Put another way: They said it would be transparent; hah!
Steve suffered from carpal-tunnel syndrome for four months after cleaning up the conversion
Steve is now working in an existing, well-defined document structure
MS Remote Desktop Connection (to Windows Enterprise Server); DITA 1.2; Vasont CMS ST (V14); XMetaL Author Enterprise 7; Apache FOP (formatting object processor) rendering engine
5 passwords to the inner sanctum: encrypted disk; laptop sign-in; Remote Desktop Connection; CMS; bug DB
Can’t copy/paste across the environment
TekQuest (customized IBM ClearQuest) DB for requirements and problem reports; all work must be tied to a problem report (PR)
Multiple writers, multiple products, multiple releases, multiple modes; many shared topics
Single-sourced; output to Eclipse help (XML), HTML internal docs (HTML) and DVD (PDF)
CMS database contains the DITA C, R, and T topics (we have roughly 50,000); topics, art source files, book maps, DITAvals; all in a soup
Book maps 700+
DITAval files 600+
Art source 5900+
C 20,000+
T 18,000+
R 10,000
Ignoring the implementation details, this is how things are supposed to work conceptually
TL 9000 (ISO 9001 quality system for telecoms) greatly influences our process (our user of the bug database)
Key point: Steep learning curve
Recommended practice is to work on one topic at a time (but I pull down the whole book)
We are not part of the software build process; our Help systems are checked in separately
Anna polled her group, and here are the (anonymous) results
Teams:
Easier to help others out regardless of project
Easier to maintain one set of tools
Consistent output across company
Teams feel like they are part of the same company
Self-ex
REUSE: Because everyone uses the same tools and writing style/standards, shared content is less likely to be identified as having been contributed by someone else.
CONDITONS: Because the conditions are turned on or off in one ditaval file, it is much easier to target an audience, for example, reviewers, customers, Oracle or SAP users.
TAG: Tagging lets our localization team quickly identify what NOT to localize (product names) and what to watch out for (UI elements).
AGILE: Because writing for DITA requires smallish topics and Agile development is development in small chunks, DITA doc’ing actually fits will into the workflow.
Getting content from there to here
Copy and paste harks back to an earlier methodology. Where were the tools that we needed?
DocBook team used XSL to convert their files, but still had to go through each xml file manually to double-check that all links, images and conversions worked.
Learning the limitations of our tools; why can’t you use an unordered list under a step? You really can’t get there from here (or do that—ever!)
We’re still finding topics that we could have shared and are still in the process of clean-up and consolidation.
What is a reference and what is a concept? Use one topic type; discover that others use another topic type for the same content type.
Very few shortcuts available requiring lots of highlighting and mouse manipulation
Love/hate relationship
Where is the content (grepwin: thank you!)
How would we summarize our experiences? I want to emphasize two key points about working in an XML world: distance and pace
Key point: Distance (pointers)
Some people fail to grasp this!
This is an environment were you have to code and compile a document, and until it renders you don’t know whether it’s worked or not
Where once you were responsible for a book, or a doc set, now you are responsible for topics, or book map(s)
DITA book map (DITA MAP) is a collection of pointers (some people never get this)
In Vasont, your level of access is only an illusion (like an Amazon book page)
This means you are working through the book map (TOC) to get to the pointees (topics)
When you double-click a line you open the pointer; to open the file itself takes 3 additional steps
Reorganizing a book means editing the book map
One at a time, not globally
Delete and move are two-step operations
(Big deal, and you never delete a topic from the database, just remove it from the map; PITA, but I love it!)
Extract-edit-reload cycle means you're working on a copy of the topic
For me, at least, every single change is working on a ship in a bottle in a refrigerator in a room in a house in a gated community--all locked
And after working with a CMS you will be qualified to work with radioactive materials
On the bright side, this definitely makes us content developers
This is as hard as anything engineers do
No one is impressed by working in Word, but this is impressive
Same issues with variable names, file names, structure, metadata, pointer lengths, null pointer exceptions
Renderings can silently fail, leaving you to hunt through the code like any engineer
The Sholes typewriter QWERTY keyboard was laid out to slow down typists
Key point: In this world you have to think first, then work slowly and deliberately
One topic at a time (though in practice we grab whole books)
Every reload is an adventure (will it work?); Steve’s average 1:20
Response time
CMS (database) updates (every element must have a unique key)
Book map changes
Steve’s method of making global changes is now harshly punished; changes must be made topic by topic
What can you do to prepare for this possible future?
You should be doing this today, regardless of your tools or future plans
Key point: Structure starts in your mind
You can format a paragraph to resemble a level head, but format is not structure
Conversion is the process of crossing fingers and hoping for the best
Impossible to categorize blended material
Better luck finding content in a can of alphabet soup
After conversion: It’s hard to anticipate sharing; opportunities grow organically
If you try too hard to share topics you’ll end up with byzantine structures
This is especially important if you work with a global team
So far, it has taken us 2 years; we are still working on issues and bugs in the system
While you’re budgeting, see about training in your new tools
Training, training, training? Check with your current staff. You might find that you have the expertise on-hand, but didn’t use it with your old tools.
We know that management usually makes decision based on the bottom line
We’re still discovering the benefits and limitations of DITA
Ownership: We are US and everything is OURs.
You no longer own the book or the help system.
You no longer control the tone, the flow, or the writing style.
Format, no worries:
Our page breaks are cringe-worthy!
We did fix the issue of a heading at the bottom of one page and the related content on the next page…but not right away.
I often quickly turn away from the lone bullet at the bottom of the page, or the step, or the sub-step
Who really needs to know what chapter they’re in or under what heading they are reading?
What!? The table continues on another page—to another 4 pages?
We cannot keep code blocks together.
Quality control: Rather than focus on writing, we focus on tools (result: loss of time and quality)
STC competitions- not now:
No format control
No content control
No output control
EVER?: refer to Steve’s talk at InterChange 2014. It’s a new changing world.
Ditavals and variables are powerful tools—learn the best way to use them in YOUR environment
(Steve’s log file for a 500-page book is over 500 pages)
Steve is going to make and flag a small, real change to a document
This should demonstrate Steve’s work environment (and the deliberate pace of work)
TL9000 made us rely on the bug DB (zucchini process)
Oracle VPN
RDC to 100.64.nn.nn (Test server)
PR 233586 (network element group name 255 -> 250 characters)
You work through a book map
Extract topics from CMS database library
You can theoretically makes changes directly within Vasont (“why would anyone want to use an external editor?”), but it's editing a database record—that way lies madness (ship in a salt shaker)
This is at a remove from working directly on file
Select the pointer to a topic and open the topic itself in a separate navigator window (this was very confusing to us)
Check out a topic to lock it ("Mother, may I?")
Best practice, and crucial in a multi-writer environment
Check that you have the latest version (good practice); otherwise, you risk overwriting other work
Check to see where else topic is used
Structure is enforced in the folder structure of extracted files (you can fight it, but why try?):
sjong
{book}
concepts
references
tasks (this is a task)
Edit topic locally
Expects files in the structure Vasont uses
If you move them elsewhere, or work directly on a topic, you can break the cross-references
(DF was fired for not understanding this)
It enforces structure (information typing) in a "smart" (AKA rigid) way
For example, no steps in a concept; no results before an action
Assures a “well-formed” document
Practically speaking, it's a guessing game what you can do and where it goes
My advice is the same as Anna’s: let it go
As Penny said, it really tries to approximate WYSIWYG
But I consider it unusable, and I don’t use this view
You can’t find attributes in Tags On view
You can’t find tags in Preview
Can only link to graphics--no embedding, but XMetaL can display them
XMetaL can render output directly, but it looks different from any rendering we use, so we don't use it
This is what I use, but it’s visually confusing (easy to miss spaces)
Frustrating: you know how Word knows better than you what to do? This is worse
This is where it’s easiest to add and edit attributes (Attribute Inspector is on)
Tag changes (both for change bars and to tie to PR database)
If you tag the wrong element, change bar will be wrong (too much or too little)
Overlapping changes are a challenge
Fit and finish issues:
Small targets
Vasont scrolls window
Keyboard defaults aren’t there (have to overuse mouse)
Shows the XML code, database metadata
This is where you can find where attributes are used
You can edit in this view—you can edit XML files using Notepad!—but a mistake can break the topic and prevent rendering altogether
This is the way the file is “actually” stored on disk
Not meant to be human-readable
Load edited topic back into Vasont library (which cancels checkout)
It takes 12 clicks
Pay attention! Failures appear quietly (this window says loading is finished; it doesn’t announce it succeeded)
If you forget to reload edited topic, you haven't changed it at all
If you forget to check it out (in Steve’s environment), it won't reload
Tip: if it looks like it's not working, it's working; if it works immediately, it didn't work at all
Steve actually typically extracts a whole book and works locally; this is probably bad practice
Steve can multitask, making a change in topic X+1 while saving topic X, but it's error-prone (load w/o edit, forget to load) and probably bad practice
First chance to see what you've done
Steve’s group uses the Apache FOP rendering engine
Rendering is handled by Vasont and a script written by the department
One click for PDF, Eclipse help, or XML
For Steve’s 500-page book, extract/render is a 30-minute process
My Julia Child moment
In Steve’s real life, this change had to be made to 30 topics