SlideShare a Scribd company logo
1 of 34
Content Archaeology:
          Raiders of the Lost Art                             Joe Gollner
                                       VP Enterprise Publishing Solutions
                                                       Stilo International
                                                      jgollner@stilo.com
Copyright © Stilo International 2009
A 1994 Presentation that Addressed a Similar Theme
Nerd Alert
The Long Road to XML (1987...)
Building Advanced
  Content Conversion,
Management & Publishing
Solutions for over 20 years
Tales from the Content Conversion Crypt
  Memories of
  Extreme Content Makeover

  Four Common Approaches

  Illustrative Examples of
  Content Conversion Experiences

  Practical Content Conversion

  Key Lessons & Themes
The Essence of Content Conversion




    Got this!              Want that!
Extreme Content Makeover
                    It can happen
                        to you!
                     Your content
                    could become
                     “spectacular”
Blood, Sweat and Tears Model of Conversion
 Manual effort
 deployed with
 great industry
 yields results
 …over time




                      It can also be cruel….
                      conversion teams have been
                      “sequestered” before...I know...
Snake Oil and Conversion Magic


Some products
claim to provide
complete conversion
solutions “out-of-the-box”

One project licensed a
“Universal Converter”
and got…
Random Generator Conversion Environment

Information Technology (IT)
Team constructs a
custom conversion solution
using tools with which they
are familiar

Sometimes works but in
more complex scenarios
can led to problems when
the programs don’t produce
the “expected” results
Over the Wall Content Conversion

Outsourced
conversion services
can be effective if
managed carefully

Often they are used
as a way to “pass the
ball” when the job
                        Conversion services have
seems too difficult
                           historically been a
                          challenging business
The problems don’t
usually go away
The Four Pillars of Content Conversion
  The Four Conversion Strategies
     Manual Effort
     Conversion Products
     Custom Conversion Environments
     Out-sourced Content Conversion

  There is Merit in Each of these Strategies
     Elements of each may figure in any effective conversion strategy
     Each may actually work in certain circumstances

  The Key Point
     Each conversion scenario is unique
     Complexity is determined by “distance” between source & target
Sources: The Harsh Reality of Legacy Content
  The Legacy Content Spectrum
    Opaque
       Not directly processable (e.g., paper / scanned images)
    Annoying
       Aggressively proprietary
       Little or no predictability in usage
    Polluted
       Normally processable but frequently
       filled with deviations & additions (HTML)
    Tolerable
       Documented format that exposes format
       & structure in a processable form
       Fortunately, popular formats are becoming
       more and more “tolerable”
Additional Potential Obstacles
 Things to watch out for:
 Content that exists in multiple formats
    Different renditions may be the best source for part of the content
    Necessitates parallel conversions of sources & merge

 Sophisticated
 supporting content
    Formulas
    Vector graphics
    Multimedia resources
    Application code
An Inconvenient Truth – About Content
                          The truth is usually
                          a little rougher...




 Some imagine that
 content is always
 cute, well-formed &
 easily handled....
Schema                  Protocols



                                                Content
                                               Instance


Demanding Targets                 XML Validation        Content Verification


 The conversion outputs are
 becoming more challenging

Published products are growing
      more sophisticated                    Transformation
                                              Processing

Underlying content needs to be
modular, reusable & intelligent




                                                   Outputs
The Key Questions
  Where are you?
    A true assessment of
    the state of your
    content sources



  Where are you
  going?
    A validated
    understanding of the
    output that you must
    produce & the uses
    to which it will be put
Practical Content Conversion
  Best Practice for Content Conversion
    Flexible posture
    Leverages the best tools & techniques
    Adapts to circumstances
    Continuously looks for
    automation opportunities
    Deploys automation under
    the guidance of the people
    who understand the content
    Leverages automation to:
        Analyse sources
        Perform transformations
        Validate results
        Analyse results
Conversion Process Roadmap
 Target                         Source to                                            Subject
                Source
  XML                            Target                  Interaction                 Matter
                Analysis
Schema                                                                               Experts
                                Mapping
                                                                        Guidance
Legacy
Source           Modify      Modified                          Manual
                                             Existing
Content        Conversion   Conversion                         Editing
                                            Conversion
                                              Rules
                Process       Rules



                Execute
Example    1                      Result                  Identified
               Conversion                                                          Interaction
  Set                            Analysis                   Issues
                Process
           2
Sample
Set 10%
           3                  Application                Validation &
Complete                                                                           Complete
                                Tests                    Verification
Set 100%
Case Study: Converting Drug Information
                   Not Recommended




                                                              C



                                                                         D
                                        A



                                                   B
                   Optional




                                        o



                                                 o



                                                            o



                                                                       o
                                     ari



                                              ari



                                                         ari



                                                                    ari
                                   en



                                              en



                                                         en



                                                                    en
                                 Sc



                                            Sc



                                                       Sc



                                                                  Sc
                   Recommeded

                Drug 1

                Drug 2

                Drug 3

                Drug 4



Migrating drug information into a
 precise digital form presented
       a critical challenge
            Source:
       Miles33, Quark
  & vendor drug monographs
            Target:
    Logical data structures
  needed to drive diagnostics
Case Study: Content Aggregation Services



Sources:
Paper
PDF
HTML
SGML
XML
Databases
…
To Burst of Not to Burst
Conversion




                                                                                 Outputs
                                                                                 Compare
                                                                                 Outputs
             Content Modularity is not an end in itself
                A business rationale must drive bursting & refactoring efforts
Case Study: Realizing Savings with Refactoring




   Outcome of refactoring:
 $100 million saved annually
Case Study: High Precision Content Conversion
But There’s More: Establishing Content Metadata
                                               Ontology
  Internal Sources
    Segments of content designated
    as valuable metadata                                                metadata


    Attributes available in source format
    Keywords & abstract
    Annotations                                                    Identify
                                                                   Extract
                                                                   Insert

  External Sources
                                                             metadata
    System Data (file information)
                                                                  Topic
    Associated keywords & descriptions
                                                                  Topic
    Ratings & commentary
    Process context                            Taxonomy
                                                                  Topic

    Additional information drawn from other
                                                                  Topic
    sources (e.g., part database)             Link Network
And Don’t Forget about the Links
  Increasingly important
  Essential for portals (enabling navigation)
  Adding links
     Source / target identification
     Link specification
     Link generation
     Link validation
     Link extraction
     Link reporting
     Link activation
  Level of precision
  is high as is
  the potential for error
Worth a Thousand Words & Special Handling
  Graphics frequently
  introduce unique challenges
    Often occur in large numbers
    Mismatch between sources
    and targets can be major
    Associated with a
    separate processing
    pipeline & quality
    control steps
    Frequently introduces
    needs for specialized
    software tools
    Occasionally demands
    manual intervention
                         Something practical can usually be done
Observations on Content Conversion
 Numerous approaches exist
   Each have a time & a place
   Applicability depends on context
      Where are you?
      Where are you going?

 Practical Content Conversion
   Flexible approach to conversion
   Selects from available tools &
   techniques to find the best solution

 Main Risk
   Dogmatically sticking to one tool &
   technique when change is demanded
Why is Content Conversion Important
 Past Investments in Content
   Were expensive to make
   Can be very valuable today
   Can embody vital business knowledge
   Can be costly to reproduce

 Rescuing Legacy Content
   Can be done efficiently & effectively
   Can save precious resources today
   Can prevent valuable knowledge
   from slipping into oblivion
You can be a Content Conversion Hero
Provided that
you know:
Where you are
Where you
are going

 Otherwise
 you might
 turn out
 to be a
 little less
 impressive
Some References
  Stilo Website
    www.stilo.com

  Stilo Migrate Online & On Demand Conversion Service
    www.stilo.com/migrate & migrate.stilo.com

  Whitepapers
    www.gollner.ca
It All Comes Down to Understanding your Content




Content may look easy to handle




                                  Sometimes content can turn nasty
The Answer Takes a Familiar Form




   But do not under-estimate the power of the right tools
      in the hands of the right people at the right time

More Related Content

Viewers also liked

Beyond Publishing (Web Content 2009)
Beyond Publishing (Web Content 2009)Beyond Publishing (Web Content 2009)
Beyond Publishing (Web Content 2009)Joe Gollner
 
Intelligent Content Management
Intelligent Content ManagementIntelligent Content Management
Intelligent Content ManagementJoe Gollner
 
Google既有商業模式的破壞者2
Google既有商業模式的破壞者2Google既有商業模式的破壞者2
Google既有商業模式的破壞者2guestd00a35
 
Managing DITA (Nov 2015)
Managing DITA (Nov 2015)Managing DITA (Nov 2015)
Managing DITA (Nov 2015)Joe Gollner
 
Secrets to Content Initiative Success (Gollner Lavacon 2014)
Secrets to Content Initiative Success (Gollner Lavacon 2014)Secrets to Content Initiative Success (Gollner Lavacon 2014)
Secrets to Content Initiative Success (Gollner Lavacon 2014)Joe Gollner
 
Content Solution Quick Start (June 2014)
Content Solution Quick Start (June 2014)Content Solution Quick Start (June 2014)
Content Solution Quick Start (June 2014)Joe Gollner
 
Introduction to Content Strategy, Technology, Engineering, Management and Sol...
Introduction to Content Strategy, Technology, Engineering, Management and Sol...Introduction to Content Strategy, Technology, Engineering, Management and Sol...
Introduction to Content Strategy, Technology, Engineering, Management and Sol...Joe Gollner
 
XML and Complex Systems (1998)
XML and Complex Systems (1998)XML and Complex Systems (1998)
XML and Complex Systems (1998)Joe Gollner
 
Why SGML (Retro Alert 1995)
Why SGML (Retro Alert 1995)Why SGML (Retro Alert 1995)
Why SGML (Retro Alert 1995)Joe Gollner
 
The Anatomy of Content Management (workshop by J Gollner at Intelligent Conte...
The Anatomy of Content Management (workshop by J Gollner at Intelligent Conte...The Anatomy of Content Management (workshop by J Gollner at Intelligent Conte...
The Anatomy of Content Management (workshop by J Gollner at Intelligent Conte...Joe Gollner
 
The Emergence of Intelligent Content (Revised)
The Emergence of Intelligent Content (Revised)The Emergence of Intelligent Content (Revised)
The Emergence of Intelligent Content (Revised)Joe Gollner
 
The Accidental Content Strategist (Gnostyx)
The Accidental Content Strategist (Gnostyx)The Accidental Content Strategist (Gnostyx)
The Accidental Content Strategist (Gnostyx)Joe Gollner
 
Intelligent Content Strategies
Intelligent Content StrategiesIntelligent Content Strategies
Intelligent Content StrategiesJoe Gollner
 
So You Want to be a Content Engineer (ICC 2014)
So You Want to be a Content Engineer (ICC 2014)So You Want to be a Content Engineer (ICC 2014)
So You Want to be a Content Engineer (ICC 2014)Joe Gollner
 

Viewers also liked (14)

Beyond Publishing (Web Content 2009)
Beyond Publishing (Web Content 2009)Beyond Publishing (Web Content 2009)
Beyond Publishing (Web Content 2009)
 
Intelligent Content Management
Intelligent Content ManagementIntelligent Content Management
Intelligent Content Management
 
Google既有商業模式的破壞者2
Google既有商業模式的破壞者2Google既有商業模式的破壞者2
Google既有商業模式的破壞者2
 
Managing DITA (Nov 2015)
Managing DITA (Nov 2015)Managing DITA (Nov 2015)
Managing DITA (Nov 2015)
 
Secrets to Content Initiative Success (Gollner Lavacon 2014)
Secrets to Content Initiative Success (Gollner Lavacon 2014)Secrets to Content Initiative Success (Gollner Lavacon 2014)
Secrets to Content Initiative Success (Gollner Lavacon 2014)
 
Content Solution Quick Start (June 2014)
Content Solution Quick Start (June 2014)Content Solution Quick Start (June 2014)
Content Solution Quick Start (June 2014)
 
Introduction to Content Strategy, Technology, Engineering, Management and Sol...
Introduction to Content Strategy, Technology, Engineering, Management and Sol...Introduction to Content Strategy, Technology, Engineering, Management and Sol...
Introduction to Content Strategy, Technology, Engineering, Management and Sol...
 
XML and Complex Systems (1998)
XML and Complex Systems (1998)XML and Complex Systems (1998)
XML and Complex Systems (1998)
 
Why SGML (Retro Alert 1995)
Why SGML (Retro Alert 1995)Why SGML (Retro Alert 1995)
Why SGML (Retro Alert 1995)
 
The Anatomy of Content Management (workshop by J Gollner at Intelligent Conte...
The Anatomy of Content Management (workshop by J Gollner at Intelligent Conte...The Anatomy of Content Management (workshop by J Gollner at Intelligent Conte...
The Anatomy of Content Management (workshop by J Gollner at Intelligent Conte...
 
The Emergence of Intelligent Content (Revised)
The Emergence of Intelligent Content (Revised)The Emergence of Intelligent Content (Revised)
The Emergence of Intelligent Content (Revised)
 
The Accidental Content Strategist (Gnostyx)
The Accidental Content Strategist (Gnostyx)The Accidental Content Strategist (Gnostyx)
The Accidental Content Strategist (Gnostyx)
 
Intelligent Content Strategies
Intelligent Content StrategiesIntelligent Content Strategies
Intelligent Content Strategies
 
So You Want to be a Content Engineer (ICC 2014)
So You Want to be a Content Engineer (ICC 2014)So You Want to be a Content Engineer (ICC 2014)
So You Want to be a Content Engineer (ICC 2014)
 

Similar to Content Archaeology (Keynote for DocTrain West March 2009)

Ensuring Information Quality (June 2008)
Ensuring Information Quality (June 2008)Ensuring Information Quality (June 2008)
Ensuring Information Quality (June 2008)Scott Abel
 
XML without Tears (J Gollner at Intelligent Content 2012)
XML without Tears (J Gollner at Intelligent Content 2012)XML without Tears (J Gollner at Intelligent Content 2012)
XML without Tears (J Gollner at Intelligent Content 2012)Joe Gollner
 
Introduction to Content Engineering
Introduction to Content EngineeringIntroduction to Content Engineering
Introduction to Content EngineeringJoe Gollner
 
20110507 Implementing Continuous Deployment
20110507 Implementing Continuous Deployment20110507 Implementing Continuous Deployment
20110507 Implementing Continuous DeploymentXebiaLabs
 
Framework Engineering_Final
Framework Engineering_FinalFramework Engineering_Final
Framework Engineering_FinalYoungSu Son
 
Industry - Relating Developers' Concepts and Artefact Vocabulary in a Financ...
Industry -  Relating Developers' Concepts and Artefact Vocabulary in a Financ...Industry -  Relating Developers' Concepts and Artefact Vocabulary in a Financ...
Industry - Relating Developers' Concepts and Artefact Vocabulary in a Financ...ICSM 2011
 
“It’s not rocket science!” Applying CMS and semantic enrichment to transform...
“It’s not rocket science!”  Applying CMS and semantic enrichment to transform...“It’s not rocket science!”  Applying CMS and semantic enrichment to transform...
“It’s not rocket science!” Applying CMS and semantic enrichment to transform...Sarah Silveri, RSI Content Solutions
 
Patterns, Components, and Code, Oh My!
Patterns, Components, and Code, Oh My!Patterns, Components, and Code, Oh My!
Patterns, Components, and Code, Oh My!Erin Malone
 
Converging Textual and Graphical Editors
Converging Textual  and Graphical EditorsConverging Textual  and Graphical Editors
Converging Textual and Graphical Editorsmeysholdt
 
Business Event Driven Architecture & Governance in Action
Business Event Driven Architecture & Governance in ActionBusiness Event Driven Architecture & Governance in Action
Business Event Driven Architecture & Governance in ActionHostedbyConfluent
 
Efficient Validation of Large Models using the Mogwaï Tool
Efficient Validation of Large Models using the Mogwaï ToolEfficient Validation of Large Models using the Mogwaï Tool
Efficient Validation of Large Models using the Mogwaï ToolGwendal Daniel
 
Model-driven Development of Model Transformations
Model-driven Development of Model TransformationsModel-driven Development of Model Transformations
Model-driven Development of Model TransformationsPieter Van Gorp
 
Building a semantic enterprise content management system from scratch v1
Building a semantic enterprise content management system from scratch v1Building a semantic enterprise content management system from scratch v1
Building a semantic enterprise content management system from scratch v1Ron Michael Zettlemoyer
 
Building a semantic enterprise content management system v2
Building a semantic enterprise content management system v2Building a semantic enterprise content management system v2
Building a semantic enterprise content management system v2Ron Michael Zettlemoyer
 
Talent Base Case: Nokia.com Content Strategy
Talent Base Case: Nokia.com Content StrategyTalent Base Case: Nokia.com Content Strategy
Talent Base Case: Nokia.com Content StrategyLoihde Advisory
 
The GoodRelations Ontology: Making Semantic Web-based E-Commerce a Reality
The GoodRelations Ontology: Making Semantic  Web-based E-Commerce a RealityThe GoodRelations Ontology: Making Semantic  Web-based E-Commerce a Reality
The GoodRelations Ontology: Making Semantic Web-based E-Commerce a RealityMartin Hepp
 
Evolving Web: Drupal 7 in Higher Education Case Study
Evolving Web: Drupal 7 in Higher Education Case Study Evolving Web: Drupal 7 in Higher Education Case Study
Evolving Web: Drupal 7 in Higher Education Case Study dergachev
 
Automating Data Pipelines: Moving away from Scripts and Excel
Automating Data Pipelines: Moving away from Scripts and ExcelAutomating Data Pipelines: Moving away from Scripts and Excel
Automating Data Pipelines: Moving away from Scripts and ExcelCloverDX
 

Similar to Content Archaeology (Keynote for DocTrain West March 2009) (20)

Ensuring Information Quality (June 2008)
Ensuring Information Quality (June 2008)Ensuring Information Quality (June 2008)
Ensuring Information Quality (June 2008)
 
XML without Tears (J Gollner at Intelligent Content 2012)
XML without Tears (J Gollner at Intelligent Content 2012)XML without Tears (J Gollner at Intelligent Content 2012)
XML without Tears (J Gollner at Intelligent Content 2012)
 
Introduction to Content Engineering
Introduction to Content EngineeringIntroduction to Content Engineering
Introduction to Content Engineering
 
NoSQL learnings from the world of Telco
NoSQL learnings from the world of TelcoNoSQL learnings from the world of Telco
NoSQL learnings from the world of Telco
 
20110507 Implementing Continuous Deployment
20110507 Implementing Continuous Deployment20110507 Implementing Continuous Deployment
20110507 Implementing Continuous Deployment
 
Framework Engineering_Final
Framework Engineering_FinalFramework Engineering_Final
Framework Engineering_Final
 
Industry - Relating Developers' Concepts and Artefact Vocabulary in a Financ...
Industry -  Relating Developers' Concepts and Artefact Vocabulary in a Financ...Industry -  Relating Developers' Concepts and Artefact Vocabulary in a Financ...
Industry - Relating Developers' Concepts and Artefact Vocabulary in a Financ...
 
“It’s not rocket science!” Applying CMS and semantic enrichment to transform...
“It’s not rocket science!”  Applying CMS and semantic enrichment to transform...“It’s not rocket science!”  Applying CMS and semantic enrichment to transform...
“It’s not rocket science!” Applying CMS and semantic enrichment to transform...
 
Patterns, Components, and Code, Oh My!
Patterns, Components, and Code, Oh My!Patterns, Components, and Code, Oh My!
Patterns, Components, and Code, Oh My!
 
Converging Textual and Graphical Editors
Converging Textual  and Graphical EditorsConverging Textual  and Graphical Editors
Converging Textual and Graphical Editors
 
Business Event Driven Architecture & Governance in Action
Business Event Driven Architecture & Governance in ActionBusiness Event Driven Architecture & Governance in Action
Business Event Driven Architecture & Governance in Action
 
Efficient Validation of Large Models using the Mogwaï Tool
Efficient Validation of Large Models using the Mogwaï ToolEfficient Validation of Large Models using the Mogwaï Tool
Efficient Validation of Large Models using the Mogwaï Tool
 
Model-driven Development of Model Transformations
Model-driven Development of Model TransformationsModel-driven Development of Model Transformations
Model-driven Development of Model Transformations
 
Building a semantic enterprise content management system from scratch v1
Building a semantic enterprise content management system from scratch v1Building a semantic enterprise content management system from scratch v1
Building a semantic enterprise content management system from scratch v1
 
Building a semantic enterprise content management system v2
Building a semantic enterprise content management system v2Building a semantic enterprise content management system v2
Building a semantic enterprise content management system v2
 
Talent Base Case: Nokia.com Content Strategy
Talent Base Case: Nokia.com Content StrategyTalent Base Case: Nokia.com Content Strategy
Talent Base Case: Nokia.com Content Strategy
 
The GoodRelations Ontology: Making Semantic Web-based E-Commerce a Reality
The GoodRelations Ontology: Making Semantic  Web-based E-Commerce a RealityThe GoodRelations Ontology: Making Semantic  Web-based E-Commerce a Reality
The GoodRelations Ontology: Making Semantic Web-based E-Commerce a Reality
 
Evolving Web: Drupal 7 in Higher Education Case Study
Evolving Web: Drupal 7 in Higher Education Case Study Evolving Web: Drupal 7 in Higher Education Case Study
Evolving Web: Drupal 7 in Higher Education Case Study
 
Automating Data Pipelines: Moving away from Scripts and Excel
Automating Data Pipelines: Moving away from Scripts and ExcelAutomating Data Pipelines: Moving away from Scripts and Excel
Automating Data Pipelines: Moving away from Scripts and Excel
 
eccenca Introduction
eccenca Introductioneccenca Introduction
eccenca Introduction
 

More from Joe Gollner

A Content Manifesto (Gnostyx CIDM IDEAS Conference 2020)
A Content Manifesto (Gnostyx CIDM IDEAS Conference 2020)A Content Manifesto (Gnostyx CIDM IDEAS Conference 2020)
A Content Manifesto (Gnostyx CIDM IDEAS Conference 2020)Joe Gollner
 
The Economics of Content (October 2019)
The Economics of Content (October 2019)The Economics of Content (October 2019)
The Economics of Content (October 2019)Joe Gollner
 
So You Want a CMS (Gnostyx Workshop Lavacon 2016)
So You Want a CMS (Gnostyx Workshop Lavacon 2016)So You Want a CMS (Gnostyx Workshop Lavacon 2016)
So You Want a CMS (Gnostyx Workshop Lavacon 2016)Joe Gollner
 
Managing Knowledge in the Fractal Enterprise (Retro Alert 1999)
Managing Knowledge in the Fractal Enterprise (Retro Alert 1999)Managing Knowledge in the Fractal Enterprise (Retro Alert 1999)
Managing Knowledge in the Fractal Enterprise (Retro Alert 1999)Joe Gollner
 
Digital Transformation and DITA
Digital Transformation and DITADigital Transformation and DITA
Digital Transformation and DITAJoe Gollner
 
Engineering Content: The Discipline of Designing Future-Ready Content
Engineering Content: The Discipline of Designing Future-Ready ContentEngineering Content: The Discipline of Designing Future-Ready Content
Engineering Content: The Discipline of Designing Future-Ready ContentJoe Gollner
 
Brave New World of Technical Communication
Brave New World of Technical CommunicationBrave New World of Technical Communication
Brave New World of Technical CommunicationJoe Gollner
 
Digital Transformation and the Business of Content (May 2017)
Digital Transformation and the Business of Content (May 2017)Digital Transformation and the Business of Content (May 2017)
Digital Transformation and the Business of Content (May 2017)Joe Gollner
 
Three Projects One Lesson (April 2017)
Three Projects One Lesson (April 2017)Three Projects One Lesson (April 2017)
Three Projects One Lesson (April 2017)Joe Gollner
 
CALS and Canadian Government Acquisition 1994
CALS and Canadian Government Acquisition 1994CALS and Canadian Government Acquisition 1994
CALS and Canadian Government Acquisition 1994Joe Gollner
 
Coordinating Markup Projects (CALS Expo 1995)
Coordinating Markup Projects (CALS Expo 1995)Coordinating Markup Projects (CALS Expo 1995)
Coordinating Markup Projects (CALS Expo 1995)Joe Gollner
 
Information 4.0 for Industry 4.0 (TCWorld 2016)
Information 4.0 for Industry 4.0 (TCWorld 2016)Information 4.0 for Industry 4.0 (TCWorld 2016)
Information 4.0 for Industry 4.0 (TCWorld 2016)Joe Gollner
 
Are You Ready for Content 4 0?
Are You Ready for Content 4 0?Are You Ready for Content 4 0?
Are You Ready for Content 4 0?Joe Gollner
 
Managing Software as Knowledge (2005)
Managing Software as Knowledge (2005)Managing Software as Knowledge (2005)
Managing Software as Knowledge (2005)Joe Gollner
 
Practical Steps Towards Integrated Content Management (Nov 2015)
Practical Steps Towards Integrated Content Management (Nov 2015)Practical Steps Towards Integrated Content Management (Nov 2015)
Practical Steps Towards Integrated Content Management (Nov 2015)Joe Gollner
 
The Dark Arts of Content Leadership
The Dark Arts of Content LeadershipThe Dark Arts of Content Leadership
The Dark Arts of Content LeadershipJoe Gollner
 
Integrated Content Management - Information Energy 2015 Keynote
Integrated Content Management - Information Energy 2015 KeynoteIntegrated Content Management - Information Energy 2015 Keynote
Integrated Content Management - Information Energy 2015 KeynoteJoe Gollner
 
DITA - What is it good for? (J Gollner 2015)
DITA - What is it good for? (J Gollner 2015)DITA - What is it good for? (J Gollner 2015)
DITA - What is it good for? (J Gollner 2015)Joe Gollner
 
Defining Intelligent Content (J Gollner Mar 2015)
Defining Intelligent Content (J Gollner Mar 2015)Defining Intelligent Content (J Gollner Mar 2015)
Defining Intelligent Content (J Gollner Mar 2015)Joe Gollner
 

More from Joe Gollner (20)

A Content Manifesto (Gnostyx CIDM IDEAS Conference 2020)
A Content Manifesto (Gnostyx CIDM IDEAS Conference 2020)A Content Manifesto (Gnostyx CIDM IDEAS Conference 2020)
A Content Manifesto (Gnostyx CIDM IDEAS Conference 2020)
 
The Economics of Content (October 2019)
The Economics of Content (October 2019)The Economics of Content (October 2019)
The Economics of Content (October 2019)
 
So You Want a CMS (Gnostyx Workshop Lavacon 2016)
So You Want a CMS (Gnostyx Workshop Lavacon 2016)So You Want a CMS (Gnostyx Workshop Lavacon 2016)
So You Want a CMS (Gnostyx Workshop Lavacon 2016)
 
Managing Knowledge in the Fractal Enterprise (Retro Alert 1999)
Managing Knowledge in the Fractal Enterprise (Retro Alert 1999)Managing Knowledge in the Fractal Enterprise (Retro Alert 1999)
Managing Knowledge in the Fractal Enterprise (Retro Alert 1999)
 
Digital Transformation and DITA
Digital Transformation and DITADigital Transformation and DITA
Digital Transformation and DITA
 
Engineering Content: The Discipline of Designing Future-Ready Content
Engineering Content: The Discipline of Designing Future-Ready ContentEngineering Content: The Discipline of Designing Future-Ready Content
Engineering Content: The Discipline of Designing Future-Ready Content
 
Brave New World of Technical Communication
Brave New World of Technical CommunicationBrave New World of Technical Communication
Brave New World of Technical Communication
 
Digital Transformation and the Business of Content (May 2017)
Digital Transformation and the Business of Content (May 2017)Digital Transformation and the Business of Content (May 2017)
Digital Transformation and the Business of Content (May 2017)
 
Three Projects One Lesson (April 2017)
Three Projects One Lesson (April 2017)Three Projects One Lesson (April 2017)
Three Projects One Lesson (April 2017)
 
CALS and Canadian Government Acquisition 1994
CALS and Canadian Government Acquisition 1994CALS and Canadian Government Acquisition 1994
CALS and Canadian Government Acquisition 1994
 
Coordinating Markup Projects (CALS Expo 1995)
Coordinating Markup Projects (CALS Expo 1995)Coordinating Markup Projects (CALS Expo 1995)
Coordinating Markup Projects (CALS Expo 1995)
 
Information 4.0 for Industry 4.0 (TCWorld 2016)
Information 4.0 for Industry 4.0 (TCWorld 2016)Information 4.0 for Industry 4.0 (TCWorld 2016)
Information 4.0 for Industry 4.0 (TCWorld 2016)
 
Are You Ready for Content 4 0?
Are You Ready for Content 4 0?Are You Ready for Content 4 0?
Are You Ready for Content 4 0?
 
Content 4.0
Content 4.0Content 4.0
Content 4.0
 
Managing Software as Knowledge (2005)
Managing Software as Knowledge (2005)Managing Software as Knowledge (2005)
Managing Software as Knowledge (2005)
 
Practical Steps Towards Integrated Content Management (Nov 2015)
Practical Steps Towards Integrated Content Management (Nov 2015)Practical Steps Towards Integrated Content Management (Nov 2015)
Practical Steps Towards Integrated Content Management (Nov 2015)
 
The Dark Arts of Content Leadership
The Dark Arts of Content LeadershipThe Dark Arts of Content Leadership
The Dark Arts of Content Leadership
 
Integrated Content Management - Information Energy 2015 Keynote
Integrated Content Management - Information Energy 2015 KeynoteIntegrated Content Management - Information Energy 2015 Keynote
Integrated Content Management - Information Energy 2015 Keynote
 
DITA - What is it good for? (J Gollner 2015)
DITA - What is it good for? (J Gollner 2015)DITA - What is it good for? (J Gollner 2015)
DITA - What is it good for? (J Gollner 2015)
 
Defining Intelligent Content (J Gollner Mar 2015)
Defining Intelligent Content (J Gollner Mar 2015)Defining Intelligent Content (J Gollner Mar 2015)
Defining Intelligent Content (J Gollner Mar 2015)
 

Recently uploaded

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 

Recently uploaded (20)

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 

Content Archaeology (Keynote for DocTrain West March 2009)

  • 1. Content Archaeology: Raiders of the Lost Art Joe Gollner VP Enterprise Publishing Solutions Stilo International jgollner@stilo.com Copyright © Stilo International 2009
  • 2. A 1994 Presentation that Addressed a Similar Theme
  • 4. The Long Road to XML (1987...)
  • 5. Building Advanced Content Conversion, Management & Publishing Solutions for over 20 years
  • 6. Tales from the Content Conversion Crypt Memories of Extreme Content Makeover Four Common Approaches Illustrative Examples of Content Conversion Experiences Practical Content Conversion Key Lessons & Themes
  • 7. The Essence of Content Conversion Got this! Want that!
  • 8. Extreme Content Makeover It can happen to you! Your content could become “spectacular”
  • 9. Blood, Sweat and Tears Model of Conversion Manual effort deployed with great industry yields results …over time It can also be cruel…. conversion teams have been “sequestered” before...I know...
  • 10. Snake Oil and Conversion Magic Some products claim to provide complete conversion solutions “out-of-the-box” One project licensed a “Universal Converter” and got…
  • 11. Random Generator Conversion Environment Information Technology (IT) Team constructs a custom conversion solution using tools with which they are familiar Sometimes works but in more complex scenarios can led to problems when the programs don’t produce the “expected” results
  • 12. Over the Wall Content Conversion Outsourced conversion services can be effective if managed carefully Often they are used as a way to “pass the ball” when the job Conversion services have seems too difficult historically been a challenging business The problems don’t usually go away
  • 13. The Four Pillars of Content Conversion The Four Conversion Strategies Manual Effort Conversion Products Custom Conversion Environments Out-sourced Content Conversion There is Merit in Each of these Strategies Elements of each may figure in any effective conversion strategy Each may actually work in certain circumstances The Key Point Each conversion scenario is unique Complexity is determined by “distance” between source & target
  • 14. Sources: The Harsh Reality of Legacy Content The Legacy Content Spectrum Opaque Not directly processable (e.g., paper / scanned images) Annoying Aggressively proprietary Little or no predictability in usage Polluted Normally processable but frequently filled with deviations & additions (HTML) Tolerable Documented format that exposes format & structure in a processable form Fortunately, popular formats are becoming more and more “tolerable”
  • 15. Additional Potential Obstacles Things to watch out for: Content that exists in multiple formats Different renditions may be the best source for part of the content Necessitates parallel conversions of sources & merge Sophisticated supporting content Formulas Vector graphics Multimedia resources Application code
  • 16. An Inconvenient Truth – About Content The truth is usually a little rougher... Some imagine that content is always cute, well-formed & easily handled....
  • 17. Schema Protocols Content Instance Demanding Targets XML Validation Content Verification The conversion outputs are becoming more challenging Published products are growing more sophisticated Transformation Processing Underlying content needs to be modular, reusable & intelligent Outputs
  • 18. The Key Questions Where are you? A true assessment of the state of your content sources Where are you going? A validated understanding of the output that you must produce & the uses to which it will be put
  • 19. Practical Content Conversion Best Practice for Content Conversion Flexible posture Leverages the best tools & techniques Adapts to circumstances Continuously looks for automation opportunities Deploys automation under the guidance of the people who understand the content Leverages automation to: Analyse sources Perform transformations Validate results Analyse results
  • 20. Conversion Process Roadmap Target Source to Subject Source XML Target Interaction Matter Analysis Schema Experts Mapping Guidance Legacy Source Modify Modified Manual Existing Content Conversion Conversion Editing Conversion Rules Process Rules Execute Example 1 Result Identified Conversion Interaction Set Analysis Issues Process 2 Sample Set 10% 3 Application Validation & Complete Complete Tests Verification Set 100%
  • 21. Case Study: Converting Drug Information Not Recommended C D A B Optional o o o o ari ari ari ari en en en en Sc Sc Sc Sc Recommeded Drug 1 Drug 2 Drug 3 Drug 4 Migrating drug information into a precise digital form presented a critical challenge Source: Miles33, Quark & vendor drug monographs Target: Logical data structures needed to drive diagnostics
  • 22. Case Study: Content Aggregation Services Sources: Paper PDF HTML SGML XML Databases …
  • 23. To Burst of Not to Burst Conversion Outputs Compare Outputs Content Modularity is not an end in itself A business rationale must drive bursting & refactoring efforts
  • 24. Case Study: Realizing Savings with Refactoring Outcome of refactoring: $100 million saved annually
  • 25. Case Study: High Precision Content Conversion
  • 26. But There’s More: Establishing Content Metadata Ontology Internal Sources Segments of content designated as valuable metadata metadata Attributes available in source format Keywords & abstract Annotations Identify Extract Insert External Sources metadata System Data (file information) Topic Associated keywords & descriptions Topic Ratings & commentary Process context Taxonomy Topic Additional information drawn from other Topic sources (e.g., part database) Link Network
  • 27. And Don’t Forget about the Links Increasingly important Essential for portals (enabling navigation) Adding links Source / target identification Link specification Link generation Link validation Link extraction Link reporting Link activation Level of precision is high as is the potential for error
  • 28. Worth a Thousand Words & Special Handling Graphics frequently introduce unique challenges Often occur in large numbers Mismatch between sources and targets can be major Associated with a separate processing pipeline & quality control steps Frequently introduces needs for specialized software tools Occasionally demands manual intervention Something practical can usually be done
  • 29. Observations on Content Conversion Numerous approaches exist Each have a time & a place Applicability depends on context Where are you? Where are you going? Practical Content Conversion Flexible approach to conversion Selects from available tools & techniques to find the best solution Main Risk Dogmatically sticking to one tool & technique when change is demanded
  • 30. Why is Content Conversion Important Past Investments in Content Were expensive to make Can be very valuable today Can embody vital business knowledge Can be costly to reproduce Rescuing Legacy Content Can be done efficiently & effectively Can save precious resources today Can prevent valuable knowledge from slipping into oblivion
  • 31. You can be a Content Conversion Hero Provided that you know: Where you are Where you are going Otherwise you might turn out to be a little less impressive
  • 32. Some References Stilo Website www.stilo.com Stilo Migrate Online & On Demand Conversion Service www.stilo.com/migrate & migrate.stilo.com Whitepapers www.gollner.ca
  • 33. It All Comes Down to Understanding your Content Content may look easy to handle Sometimes content can turn nasty
  • 34. The Answer Takes a Familiar Form But do not under-estimate the power of the right tools in the hands of the right people at the right time