Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Engineering Web Content (Web Content 2009)
1. Engineering Web Content:
Bringing Discipline & Automation
to the Business of Managing Content
Joe Gollner
VP Enterprise Publishing Solutions
jgollner@stilo.com
3. Engineering Web Content: Part 1 Topics
Introducing Content Engineering
Introducing the Content Processing Roadmap
Conversion (content modernization)
Refactoring (content optimization)
Profiling (metadata collection)
Aim:
Introduce the tools and techniques
that constitute a practical working framework for
discussing, designing, developing and deploying
content management and processing systems
4. The Truth about Content
We are faced with:
Massively expanding content volumes
Diversifying venues for content delivery
Proliferating format varieties
Rising expectations of users
Escalating specialization of content
Evolving interconnectedness of content
Multiplying problems related to content security
Continuing lifecycle challenges (obsolescence remains a risk)
Increasing complexity of content
(the reintegration of data & documents)
Growing recognition of the central importance of content
5. An Essential Response: Content Engineering
Working Definition
The application of
rigorous engineering discipline
to the design, development
and deployment of
content management and
processing systems
Distinguishing Features
Systematic approach
Progressive use of technology
Awareness of
Lifecycle considerations
Total cost of ownership
Solution scalability
6. Engineering and Content
Organizing work
Laying out
work spaces
Sequencing of
process steps
Optimizing tasks
Refining tools
Improving materials
Transferring results
between stages
Sharing resources
Performing
maintenance
Troubleshooting
problems
Differential Analyzer – Vannevar Bush (1930s)
7. Content Engineering
Content Engineering
Governing discipline
Goal-directed
Content Management
Protect Value
Content Processing
Enhance Value
People
Create Value
Planning
Designing
Authoring
Editing
8. Content Management Components
Content Management
Control
Organize resources, access
and lifecycle
Change
Facilitate the evolution of
content and the associated
services
Deploy
Enable the services
the content makes
possible
Control Change Deploy
13. Converting Content
?
Conversion: changing the format of legacy content to make it increasingly
suitable for efficient management, revision, reuse and publishing.
14. The Harsh Reality of Legacy Content
The Legacy Content Spectrum
Opaque
Not directly processable (e.g., paper / scanned images)
Annoying
Aggressively proprietary
Little or no predictability in usage
Polluted
Normally processable but frequently
filled with deviations & additions (HTML)
Tolerable
Documented format that exposes format
& structure in a processable form
Fortunately, popular formats are becoming
more and more “tolerable”
15. Conversion Fundamentals
Conversion is unavoidable and always under-estimated
Conversion is fundamentally a matter of interpretation
Parsing the legacy format & layout
Inferring a meaning from this information
Correlating the format & layout to a target structure
Addressing problems introduced by format peculiarities
Leveraging the content itself to guide format interpretation
Enhancing interpretive rules by matching content patterns
Automating conversion typically relies on two stages:
Format Interpreter that can make sense of source formatting
Rules-based Correlation Processor that maps content into structures
16. Conversion Process Template
Target Source to Subject
XML Source
Target Interaction Matter
Schema Analysis Experts
Mapping
Guidance
Legacy
Source Modify Modified Manual
Existing
Content Conversion Conversion Conversion Editing
Rules
Process Rules
Example 1 Execute
Result Identified
Set Conversion Interaction
Analysis Issues
Process
Sample 2
Set 10%
Complete 3 Application Validation &
Complete
Set 100% Tests Verification
17. The Key Questions
Where are you?
A true assessment of
the state of your
content sources
Where are you
going?
A validated
understanding of the
output that you must
produce & the uses
to which it will be put
18. Practical Content Conversion
Best Practice for Content Conversion
Flexible posture
Leverages the best tools & techniques
Adapts to circumstances
Continuously looks for
automation opportunities
Deploys automation under
the guidance of the people
who understand the content
Leverages automation to:
Analyse sources
Perform transformations
Validate results
Analyse results
19. Scenario: Converting Drug Information
Not Recommended
oC
oD
oA
oB
Optional
ari
ar i
ari
ar i
en
en
en
en
Sc
Sc
Sc
Sc
Recommeded
Drug 1
Drug 2
Drug 3
Drug 4
Migrating drug information into a
precise digital form presented
a critical challenge
Source:
Miles33, Quark
& vendor drug monographs
Target:
Logical data structures
needed to drive diagnostics
21. Refactoring Content
Refactoring: restructuring content, without loss of meaning, to improve its
suitability for management, maintenance and specifically reuse.
22. Aspects of Refactoring
Refactoring breaks down into two tasks
Bursting
Normalization
Content Bursting
Decomposing content into components optimized for reuse
Content Normalization
Systematic removal of redundancies to improve maintainability
Challenges
Ensuring content components remain meaningful & manageable
Maintaining a complete equivalence with the original
Adapting the linking mechanisms so they remain valid and functional
Usually entails introduction of an indirect referencing scheme
23. To Burst or Not to Burst
Conversion
Outputs
Compare
Outputs
Content Modularity is not an end in itself
A business rationale must drive bursting & refactoring efforts
26. Collecting Metadata
Metadata: a set of data that provides information about other data.
Collecting Metadata: extracting, validating, integrating, supplementing,
synchronizing and storing metadata from, and about, the content.
27. The Function of Metadata
Metadata is used to make the context of content explicit
Used to facilitate
Control
Security
Limitation of rights
Orderly storage & retrieval
Discovery
Searching
Navigating
Exchange
Surprisingly important point
The boundary between
metadata and content is
Yale University Library
never completely clear
28. Sources of Metadata
Metadata can be supplied from an external source
System data
Captured when content is created / modified
Subject information
Declaring details about the subject matter
Keywords, short descriptions,…
Externally managed data about subject
Author contributions
Annotations, justifications, abstracts,…
Process context (critically important)
Relating content to business process events
Metadata can be extracted from the content
Specific aspects of the content are selected as valuable metadata
Often one of the more precise aspects of subject-specific markup
29. Ontologies, Taxonomies & Metadata
Ontology
The Meaning of Metadata
Metadata categories and values
relate content to aspects of metadata
an Ontology
The Ontology provides the
context for metadata
Ontologies metadata
Describe a domain of knowledge
Topic
Can be used as the basis of:
Topic
Taxonomies (classification schemes)
Link networks Taxonomy
Topic
Context driven navigational aids
Topic
Link Network
32. Engineering Web Content: Part 2 Topics
Introducing the Content Processing Roadmap (Continued)
Linking (content connection)
Publishing (content delivery)
Validating (content confirmation)
Aim:
Introduce the tools and techniques
that constitute a practical working framework for
discussing, designing, developing and deploying
content management and processing systems
34. Establishing Relationships
Explicit Links (Actual)
Identifier Source Target Type
A1
A2
Implicit Links (Potential)
Identifier Source Target Type
B1
B2
Reuse Links (Physical)
Identifier Resource Request Condition
R1
R2
Links: the connections or relationships between things that
represent a significant portion of the meaning and value of content
35. Relationship Considerations
Effective linking is central to content usability & value
Ability to provide content tailored to a specific user context depends on
being able to facilitate immediate access to additional information
Linking is highly contextual
Not all relationships are relevant at the same time
How relationships are presented is format and media specific
Often leads to additional rendition requirements for content objects
Multiple renditions of graphics (thumbnail, low-res, high-res)
Links have become acknowledged as First-Class Objects
Subject to specific management and processing measures
Ideally expressed & managed separately from the content (overlays)
Associated with metadata & constituting important content metadata
36. Link Management
Link Analysis:
Increasingly Outbound Links: Intact or broken
important Transclusions: Where used
metadata
Inbound Links: Track-back / Where cited
Increasingly External Links: Network participation
complex
L ink
Link Analysis metadata
b o und
Out
Significant
L in k
processing cl u sion
Trans
Leverages
external i nk
ou nd L
storage of links Inb
Bidirectional
External Link
& link metadata
Link generation
becoming critical
Link Base
37. Scenario: Forest Information Mall
FIM Interface
Search Functionality
Content Context Process Context
Finding content using a variety of Navigation through processes
familiar mechanisms and leading to (areas) surfaces sets of relevant
applicable process areas documents
Publish Process
Web XML Metadata Web
Services Sites
Databases Tools Contents Contacts
Lightweight deployment of XML & transformations to enable “process help”
39. Delivering Content
Compile Publish
Resolve
Resolve: assemble content and instantiate applicable relationships
Compile: convert resolved content into a form suitable for rendition
Publish: render the content in the forms required by the context
40. The Goal: High Fidelity Automation
Print Publishing
Content (PDF)
Web Publishing Output Print
Deliver PDF
(Portal / Portable) Products
- Resolve
- Compile
- Publish Rules Publish
Transformations
Output Variants
Templates
Delivery Processing
Resolve
Render
Output Plan
Assembling the inputs (Map & View)
Content requested Content
Supporting assets Assets Compile
Applicable stylesheets & rules
Output Web
XHTML
Resolve into a processable whole Products
Compile formattable content representations
Publish final formatted renditions
41. The Publishing Pipeline
Resolution leverages CMS / Database services (selecting)
Compilation produces “simplest possible serialization”
All stages generate activity logs that feed a “quality report”
42. Scenario: Performance Support Portals
Performance Support Portals depend upon content resources that are
intelligent and modular and that exhibit extremely high levels of quality
43. Implications for Content
What then is expected of content?
1. Content must be available as valid XML
2. Content must be modularized
3. Content must be meaningful in multiple contexts
4. Content must be discretely addressable
5. Content must be uniquely identifiable using metadata
6. Content must be linked to related content
7. Content must encourage modification & addition
8. Content must be processable with almost perfect confidence
This also has implications for the publishing process...
44. Content Processing & Validation
Validation
Essential capability
Enables consistent
processing
Streamlines
processes
Validation must be
Accurate
Manageable
Informative
Actionable
Pro-active
Continuously improving
45. Validate & Transform: Simple
Content Validation
DTD structural rules
Instance conformance
Content Transformation
Traditionally focused on arranging
content for formatting
Supporting primarily
structural manipulation
Validated Outputs
Inputs to rendition processes
HTML outputs
XML outputs
46. Schema Rules
Content
Instance
Validate & Transform: Complex
Structure Validation Content Verification
Content Validation & Verification
Schema structural rules
Rules governing content values
Instance conformance
Transformation
Content Transformation Processing
Continuous process of improvement
Parse, validate, align, verify…repeat
Manipulation of many content types
Validated Outputs
Outputs
Inputs to rendition processes
HTML outputs
XML outputs
Data outputs for applications
48. Content Solution Architecture Framework
Controls
Enterprise
Programs Domains
Active Web
Specialized
Document Sources Publishing Services
Models
Integrate
External Print
Ontology Sources Discovery Services
Rules
Legacy Application
Data Sources Content Architecture Data Services
Inputs Outputs
Users Tools
Mechanisms
Authors Content Management
Resources
Subject Matter Experts Content Processing
Administrators Budget Content Authoring
Information Architects Personnel Development Tools
Developers Infrastructure Web Services
49. Content Architecture
Content
Establishes Engineering
governing model
of the knowledge Content Architecture
domain
Content Content Solution
The knowledge Management Processing Architectures
that has informed
the content
Convert Transform Publish
The knowledge
being
encapsulated
in the solutions Refactor Collect Compile
Supports multiple
Relate Resolve
solution instances Validate
50. The Central Role of the Content Architecture
Content Service Discovery Specialized
Requirements Requirements Taxonomies
Architecture
Topic
Description Description
Procedure
Data Concept Task Reference Data
Data Description
Data
Description Procedure
Procedure
Data Data
Specialized
Information Types Specialized
Delivery Processes
Procedure
Data
Data Annotation Formatting Effectivity Data
Procedure
Data
Change Procedure
Data Data
Specialized Procedure
Data
Domains
51. Content Solution Design Principles
The nature of content demands an adaptable architecture
Technology components should be loosely-coupled
Content must always be available in its simplest self-describing form
Data stores should be replaceable by stored instances
True for content, metadata and links
Content processing events can be performed many ways
Simple methods must be present, sophisticated methods may be
All interfaces established as the exchange of validated content
Processing rules are, themselves, managed & processable content
Content Processing should be extensively leveraged
Content validation, analysis and reporting at every stage
Used to manage & optimize solution components to improve efficiency
52. General Observations
Content is inherently complex
Current trends have moved content to the center of attention
Content Engineering is an essential response
Provides the necessary discipline & the conceptual framework
Content has not typically received this level of attention in the past
Effective Content Processing is central to success
Content Management services are enabled by content processes
Adaptive content processing is essential for addressing change
Effective Content Solutions are designed to cover the complete content
lifecycle and all stakeholder perspectives
The efficient management and processing of content remains an
elusive goal for most organizations
53. Content Engineering and Business Value
The design of Content Solutions should
Continuously minimize the costs of
acquiring, enriching, managing
and delivering content
Continuously improve content
resources through enrichment
Continuously increase the
benefits realized through
the delivery of content
Continuously reduce risks
threatening content assets or
the services being supported
Each of these represents an
increase in value
54. Top Ten Secrets of Content Engineering Success
Don’t underestimate your content or your business
Don’t underestimate the power of good automation
Choose an appropriate tool set and validate your choices
Don’t invest in content management technology too early
Carefully plan and execute migration activities
Take a “customer service” focus in delivering tangible
benefits (new products / services) from your investments
Be demanding of your suppliers (expect quality)
Engage your stakeholders and “take control” of the solution
Leverage standards, don’t be enslaved by them
Be an active part of the community as a way to learn and as
a way to share what you have learned
55. End of Part 2 – Engineering Web Content
Questions & Comments
Contact
Joe Gollner
VP e-Publishing Solutions
Stilo International
jgollner@stilo.com
joe@gollner.ca
otherwise…the End