This presentation introduces the various approaches used to convert unstructured legacy content into something more useful - namely into a structured form such as that provided by DITA.
5. Building Advanced
Content Conversion,
Management & Publishing
Solutions for over 20 years
6. Tales from the Content Conversion Crypt
Memories of
Extreme Content Makeover
Four Common Approaches
Illustrative Examples of
Content Conversion Experiences
Practical Content Conversion
Key Lessons & Themes
9. Blood, Sweat and Tears Model of Conversion
Manual effort
deployed with
great industry
yields results
…over time
It can also be cruel….
conversion teams have been
“sequestered” before...I know...
10. Snake Oil and Conversion Magic
Some products
claim to provide
complete conversion
solutions “out-of-the-box”
One project licensed a
“Universal Converter”
and got…
11. Random Generator Conversion Environment
Information Technology (IT)
Team constructs a
custom conversion solution
using tools with which they
are familiar
Sometimes works but in
more complex scenarios
can led to problems when
the programs don’t produce
the “expected” results
12. Over the Wall Content Conversion
Outsourced
conversion services
can be effective if
managed carefully
Often they are used
as a way to “pass the
ball” when the job
Conversion services have
seems too difficult
historically been a
challenging business
The problems don’t
usually go away
13. The Four Pillars of Content Conversion
The Four Conversion Strategies
Manual Effort
Conversion Products
Custom Conversion Environments
Out-sourced Content Conversion
There is Merit in Each of these Strategies
Elements of each may figure in any effective conversion strategy
Each may actually work in certain circumstances
The Key Point
Each conversion scenario is unique
Complexity is determined by “distance” between source & target
14. Sources: The Harsh Reality of Legacy Content
The Legacy Content Spectrum
Opaque
Not directly processable (e.g., paper / scanned images)
Annoying
Aggressively proprietary
Little or no predictability in usage
Polluted
Normally processable but frequently
filled with deviations & additions (HTML)
Tolerable
Documented format that exposes format
& structure in a processable form
Fortunately, popular formats are becoming
more and more “tolerable”
15. Additional Potential Obstacles
Things to watch out for:
Content that exists in multiple formats
Different renditions may be the best source for part of the content
Necessitates parallel conversions of sources & merge
Sophisticated
supporting content
Formulas
Vector graphics
Multimedia resources
Application code
16. An Inconvenient Truth – About Content
The truth is usually
a little rougher...
Some imagine that
content is always
cute, well-formed &
easily handled....
17. Schema Protocols
Content
Instance
Demanding Targets XML Validation Content Verification
The conversion outputs are
becoming more challenging
Published products are growing
more sophisticated Transformation
Processing
Underlying content needs to be
modular, reusable & intelligent
Outputs
18. The Key Questions
Where are you?
A true assessment of
the state of your
content sources
Where are you
going?
A validated
understanding of the
output that you must
produce & the uses
to which it will be put
19. Practical Content Conversion
Best Practice for Content Conversion
Flexible posture
Leverages the best tools & techniques
Adapts to circumstances
Continuously looks for
automation opportunities
Deploys automation under
the guidance of the people
who understand the content
Leverages automation to:
Analyse sources
Perform transformations
Validate results
Analyse results
20. Conversion Process Roadmap
Target Source to Subject
Source
XML Target Interaction Matter
Analysis
Schema Experts
Mapping
Guidance
Legacy
Source Modify Modified Manual
Existing
Content Conversion Conversion Editing
Conversion
Rules
Process Rules
Execute
Example 1 Result Identified
Conversion Interaction
Set Analysis Issues
Process
2
Sample
Set 10%
3 Application Validation &
Complete Complete
Tests Verification
Set 100%
21. Case Study: Converting Drug Information
Not Recommended
C
D
A
B
Optional
o
o
o
o
ari
ari
ari
ari
en
en
en
en
Sc
Sc
Sc
Sc
Recommeded
Drug 1
Drug 2
Drug 3
Drug 4
Migrating drug information into a
precise digital form presented
a critical challenge
Source:
Miles33, Quark
& vendor drug monographs
Target:
Logical data structures
needed to drive diagnostics
22. Case Study: Content Aggregation Services
Sources:
Paper
PDF
HTML
SGML
XML
Databases
…
23. To Burst of Not to Burst
Conversion
Outputs
Compare
Outputs
Content Modularity is not an end in itself
A business rationale must drive bursting & refactoring efforts
24. Case Study: Realizing Savings with Refactoring
Outcome of refactoring:
$100 million saved annually
26. But There’s More: Establishing Content Metadata
Ontology
Internal Sources
Segments of content designated
as valuable metadata metadata
Attributes available in source format
Keywords & abstract
Annotations Identify
Extract
Insert
External Sources
metadata
System Data (file information)
Topic
Associated keywords & descriptions
Topic
Ratings & commentary
Process context Taxonomy
Topic
Additional information drawn from other
Topic
sources (e.g., part database) Link Network
27. And Don’t Forget about the Links
Increasingly important
Essential for portals (enabling navigation)
Adding links
Source / target identification
Link specification
Link generation
Link validation
Link extraction
Link reporting
Link activation
Level of precision
is high as is
the potential for error
28. Worth a Thousand Words & Special Handling
Graphics frequently
introduce unique challenges
Often occur in large numbers
Mismatch between sources
and targets can be major
Associated with a
separate processing
pipeline & quality
control steps
Frequently introduces
needs for specialized
software tools
Occasionally demands
manual intervention
Something practical can usually be done
29. Observations on Content Conversion
Numerous approaches exist
Each have a time & a place
Applicability depends on context
Where are you?
Where are you going?
Practical Content Conversion
Flexible approach to conversion
Selects from available tools &
techniques to find the best solution
Main Risk
Dogmatically sticking to one tool &
technique when change is demanded
30. Why is Content Conversion Important
Past Investments in Content
Were expensive to make
Can be very valuable today
Can embody vital business knowledge
Can be costly to reproduce
Rescuing Legacy Content
Can be done efficiently & effectively
Can save precious resources today
Can prevent valuable knowledge
from slipping into oblivion
31. You can be a Content Conversion Hero
Provided that
you know:
Where you are
Where you
are going
Otherwise
you might
turn out
to be a
little less
impressive
32. Some References
Stilo Website
www.stilo.com
Stilo Migrate Online & On Demand Conversion Service
www.stilo.com/migrate & migrate.stilo.com
Whitepapers
www.gollner.ca
33. It All Comes Down to Understanding your Content
Content may look easy to handle
Sometimes content can turn nasty
34. The Answer Takes a Familiar Form
But do not under-estimate the power of the right tools
in the hands of the right people at the right time