SlideShare une entreprise Scribd logo
1  sur  27
Télécharger pour lire hors ligne
From XML to eBooks
Part II: The Devil is in the Details

         Richard Hamilton
            XML Press
        hamilton@xmlpress.net
Slight Recap

For most tech comm situations:
● Two formats matter: ePub & Kindle
● XML processes (esp., Docbook or DITA)


  will make things much easier
● Content strategy is the hardest part

● Authoring is next hardest

● Production is tough, but doable

●Distribution is easiest
Overview

● Authoring
● Storing and managing content

●Producing output




    Content Strategy is critical,
    but not for this presentation
Authoring
    Authoring formats at XML Press:
    ● DocBook XML: 5 books
    ● DITA XML: 2 books (so far)

    ● Word: 4 books

    ● Wiki (Confluence): 1 book

    ● Wiki (pbworks): 3 books

    ● Author-it: 1 book

    ● InDesign: 1 book


All but 3 (1 each in Word, InDesign, & Author-it)
      were ultimately produced from XML
Authoring in a Wiki

● Based on PBWorks
● Authors, editor, reviewers, indexers, work in wiki

● Parallel access throughout most of the process

● Content exported for proofs as needed

● Content moved to SVN for final production




       Requires a clean, clear breaking point
      where content moves from wiki to SVN
HTML to XHTML
         Tidy



     
         Convert
     
         Cleanup
Pre-process XHTML
         XSL Stylesheet


     
         Remove empty
         elements
     
         Normalize
     
         Handle headings
Convert to DocBook
             Herold


      
          Infer hierarchy
      
          Convert
      
          Define structure
Process Supplemental Markup
              Perl script
          
              Index entries
          
              Footnotes
          
              Endnotes
          
              Sidebars
          
              Epigraphs
          
              Block quotes
          
              Convert all to
              DocBook
Supplemental Markup
Indexing:
  {in primary; secondary; tertiary}
  {id term 1; term 2}
  {is id; primary; secondary; tertiary}
  {ie id}
  {is term; see term}

Footnotes, sidebars, etc.
  {if footnote text}
  {ib sidebar text}
  {ip epigraph text;;attribution;;source}
  {it endnote text}
  {iq quotation;;attribution}
Cleanup
    XSL stylesheet




    Handle links

    Validate structure
What about Confluence?


Confluence,Tech Comm, Chocolate used K15t
 Software's DocBook export plugin, which
      also handles much of what the
      supplemental markup handles.
Storing and managing content
 Content has one home, but...
 ●That home can change at certain
  well-defined points
 ●For XML, SVN is the home

 ●For wiki, the wiki is the home until


  production, then SVN is the home
 ●Home changes once, irrevocably

 ●All production comes from SVN
ePub Structure
                                    Top Level Directory


mimetype (file)
                                          OEBPS                  META-INF
Application/epub_zip
                                                (folder)                  (folder)



Identifies this as an ePub file                            container.xml (file)

                                         (next page)
                                                           Points to package file in
ePub file is simply a zip file of this                          OEBPS folder.
structure, with mimetype as first
file in the zip. Uses .epub suffix.
Ebook production - DocBook
                         OEBPS Directory Contents

                                  OEBPS

                                      (folder)
 OPF file        package.opf
 Navigation file     toc.ncx
 CSS file             xyz.css
                                          ch01-toc.xhtml          HTML TOC

                   figure.jpg             ch01.xhtml
 Media           screen.png               ch01s02.xhtml
                            ...                                 HTML Content
                                          ch01s03.xhtml
                                          …
                                          chXX.xhtml
                                                           Notes:
This folder is like any website                            ●Names are arbitrary

                                                           ●Sub-folders ok
NCX View in Kindle



         Button for NCX view
             in emulator
Ebook production - DocBook
                         OEBPS Directory Contents

                                  OEBPS

                                      (folder)
 OPF file        package.opf
 Navigation file     toc.ncx
 CSS file             xyz.css
                                          ch01-toc.xhtml          HTML TOC

                   figure.jpg             ch01.xhtml
 Media           screen.png               ch01s02.xhtml
                            ...                                 HTML Content
                                          ch01s03.xhtml
                                          …
                                          chXX.xhtml
                                                           Notes:
This folder is like any website                            ●Names are arbitrary

                                                           ●Sub-folders ok
OPF (Open Packaging Format)
 <package ...>
   <metadata ...>
     … Dublin Core Metadata elements …
   </metadata>
   <manifest>
                                                            }   Metadata




                                                            }
     <item id=”ncx” media-type=”application/x-dtbncx+xml”
            href=”toc.ncx”/>
     <item id=”toc” media-type=”application/xhtml+xml”          What's in
            href=”ch01-toc.xhtml”/>                             the ePub?
     <item id=”ch01” media-type=”application/xhtml+xml”
            href=”ch01-toc.xhtml”/>
     …



                                                            }
   </manifest>
   <spine toc=”ncx”>                                            What order
     <itemref idref=”cover”/>                                   is it in?
     <itemref idref=”toc”/>
     …



                                                            }
   </spine>
   <guide>
                                                                Where do
     <reference type=”text” title=”Startup page”
                 href=”ch01.xhtml”/>                            you start?
     </reference>
   </guide>
 </package>                         Change starting place
Other tweaks to XHTML

● Remove empty paragraphs (vestige of wiki past)
● Remove <p> around first para after an <li> (for


  original Kindle)
● Work around a few epubcheck anomalies
ePub/Kindle from DocBook

● Based on open-source DocBook stylesheets
● ePub3 transform by Bob Stayton

● CSS added

● A few minor tweaks for personal preference

● Kindle (.mobi) produced using kindlegen

● Amazon tests .mobi and converts to smaller file
Generating ePub from DocBook
              DocBook XSL

          
              ePub3 transform
          
              Based on HTML5
              transform
              Generates all
              ePub3 files
Generating ePub from DocBook
               File cleanup



          
              Adjust .opf file
          
              Clean up XHTML
Generating ePub from DocBook
          File preparation


          
              Copy images
          
              Copy in CSS file
          
              Run zip to
              create .epub file
ePub/Kindle from DITA
● Based on DITA Open Toolkit and DITA for
  Publishers toolkit extensions (developed by Eliot
  Kimber)
● Does not require content to use DITA for


  Publishers specialization.
● Generates ePub2 compliant files

● Kindle (.mobi) produced using kindlegen

● Amazon tests .mobi and converts to smaller file
Thanks for listening

     Richard Hamilton
        XML Press
     hamilton@xmlpress.net

Contenu connexe

Tendances

BP107: Ten Lines Or Less: Interesting Things You Can Do In Java With Minimal ...
BP107: Ten Lines Or Less: Interesting Things You Can Do In Java With Minimal ...BP107: Ten Lines Or Less: Interesting Things You Can Do In Java With Minimal ...
BP107: Ten Lines Or Less: Interesting Things You Can Do In Java With Minimal ...panagenda
 
10 Lines or Less; Interesting Things You Can Do In Java With Minimal Code
10 Lines or Less; Interesting Things You Can Do In Java With Minimal Code10 Lines or Less; Interesting Things You Can Do In Java With Minimal Code
10 Lines or Less; Interesting Things You Can Do In Java With Minimal CodeKathy Brown
 
OpenNTF Domino API - Overview Introduction
OpenNTF Domino API - Overview IntroductionOpenNTF Domino API - Overview Introduction
OpenNTF Domino API - Overview IntroductionPaul Withers
 
Extensible Stylesheet Language
Extensible Stylesheet LanguageExtensible Stylesheet Language
Extensible Stylesheet LanguageJussi Pohjolainen
 
Everything you ever wanted to know about lotus script
Everything you ever wanted to know about lotus scriptEverything you ever wanted to know about lotus script
Everything you ever wanted to know about lotus scriptBill Buchan
 
Introduction to Thrift
Introduction to ThriftIntroduction to Thrift
Introduction to ThriftDvir Volk
 
Jdom how it works & how it opened the java process
Jdom how it works & how it opened the java processJdom how it works & how it opened the java process
Jdom how it works & how it opened the java processHicham QAISSI
 
Input File dalam C++
Input File dalam C++Input File dalam C++
Input File dalam C++Teguh Nugraha
 
#Pharo Days 2016 Data Formats and Protocols
#Pharo Days 2016 Data Formats and Protocols#Pharo Days 2016 Data Formats and Protocols
#Pharo Days 2016 Data Formats and ProtocolsPhilippe Back
 
XML for beginners
XML for beginnersXML for beginners
XML for beginnerssafysidhu
 
TYPO3 Transition Tool
TYPO3 Transition ToolTYPO3 Transition Tool
TYPO3 Transition Toolcrus0e
 
Cloudera - Using morphlines for on the-fly ETL by Wolfgang Hoschek
Cloudera - Using morphlines for on the-fly ETL by Wolfgang HoschekCloudera - Using morphlines for on the-fly ETL by Wolfgang Hoschek
Cloudera - Using morphlines for on the-fly ETL by Wolfgang HoschekHakka Labs
 
T3dallas typoscript
T3dallas typoscriptT3dallas typoscript
T3dallas typoscriptzdavis
 
Data Serialization Using Google Protocol Buffers
Data Serialization Using Google Protocol BuffersData Serialization Using Google Protocol Buffers
Data Serialization Using Google Protocol BuffersWilliam Kibira
 

Tendances (20)

Querring xml with xpath
Querring xml with xpath Querring xml with xpath
Querring xml with xpath
 
BP107: Ten Lines Or Less: Interesting Things You Can Do In Java With Minimal ...
BP107: Ten Lines Or Less: Interesting Things You Can Do In Java With Minimal ...BP107: Ten Lines Or Less: Interesting Things You Can Do In Java With Minimal ...
BP107: Ten Lines Or Less: Interesting Things You Can Do In Java With Minimal ...
 
10 Lines or Less; Interesting Things You Can Do In Java With Minimal Code
10 Lines or Less; Interesting Things You Can Do In Java With Minimal Code10 Lines or Less; Interesting Things You Can Do In Java With Minimal Code
10 Lines or Less; Interesting Things You Can Do In Java With Minimal Code
 
OpenNTF Domino API - Overview Introduction
OpenNTF Domino API - Overview IntroductionOpenNTF Domino API - Overview Introduction
OpenNTF Domino API - Overview Introduction
 
Extensible Stylesheet Language
Extensible Stylesheet LanguageExtensible Stylesheet Language
Extensible Stylesheet Language
 
xml2tex at TUG 2014
xml2tex at TUG 2014xml2tex at TUG 2014
xml2tex at TUG 2014
 
3 apache-avro
3 apache-avro3 apache-avro
3 apache-avro
 
Everything you ever wanted to know about lotus script
Everything you ever wanted to know about lotus scriptEverything you ever wanted to know about lotus script
Everything you ever wanted to know about lotus script
 
Introduction to Thrift
Introduction to ThriftIntroduction to Thrift
Introduction to Thrift
 
Jdom how it works & how it opened the java process
Jdom how it works & how it opened the java processJdom how it works & how it opened the java process
Jdom how it works & how it opened the java process
 
Input File dalam C++
Input File dalam C++Input File dalam C++
Input File dalam C++
 
#Pharo Days 2016 Data Formats and Protocols
#Pharo Days 2016 Data Formats and Protocols#Pharo Days 2016 Data Formats and Protocols
#Pharo Days 2016 Data Formats and Protocols
 
Google Protocol Buffers
Google Protocol BuffersGoogle Protocol Buffers
Google Protocol Buffers
 
XML for beginners
XML for beginnersXML for beginners
XML for beginners
 
Unit3wt
Unit3wtUnit3wt
Unit3wt
 
TYPO3 Transition Tool
TYPO3 Transition ToolTYPO3 Transition Tool
TYPO3 Transition Tool
 
Cloudera - Using morphlines for on the-fly ETL by Wolfgang Hoschek
Cloudera - Using morphlines for on the-fly ETL by Wolfgang HoschekCloudera - Using morphlines for on the-fly ETL by Wolfgang Hoschek
Cloudera - Using morphlines for on the-fly ETL by Wolfgang Hoschek
 
Cetpa dotnet taining
Cetpa dotnet tainingCetpa dotnet taining
Cetpa dotnet taining
 
T3dallas typoscript
T3dallas typoscriptT3dallas typoscript
T3dallas typoscript
 
Data Serialization Using Google Protocol Buffers
Data Serialization Using Google Protocol BuffersData Serialization Using Google Protocol Buffers
Data Serialization Using Google Protocol Buffers
 

Similaire à From XML to eBooks Part 2: The Details

Kernel Recipes 2016 - Kernel documentation: what we have and where it’s going
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s goingKernel Recipes 2016 - Kernel documentation: what we have and where it’s going
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s goingAnne Nicolas
 
You Want to Go XML-First: Now What? Building an In-House XML-First Workflow -...
You Want to Go XML-First: Now What? Building an In-House XML-First Workflow -...You Want to Go XML-First: Now What? Building an In-House XML-First Workflow -...
You Want to Go XML-First: Now What? Building an In-House XML-First Workflow -...BookNet Canada
 
Open Writing! Collaborative Authoring for CloudStack Documentation by Jessica...
Open Writing! Collaborative Authoring for CloudStack Documentation by Jessica...Open Writing! Collaborative Authoring for CloudStack Documentation by Jessica...
Open Writing! Collaborative Authoring for CloudStack Documentation by Jessica...buildacloud
 
Open writing-cloud-collab
Open writing-cloud-collabOpen writing-cloud-collab
Open writing-cloud-collabKaren Vuong
 
16 wordprocessing ml subject - odds and ends
16   wordprocessing ml subject - odds and ends16   wordprocessing ml subject - odds and ends
16 wordprocessing ml subject - odds and endsShawn Villaron
 
Building bridges - Plone Conference 2015 Bucharest
Building bridges   - Plone Conference 2015 BucharestBuilding bridges   - Plone Conference 2015 Bucharest
Building bridges - Plone Conference 2015 BucharestAndreas Jung
 
UNC Chapel Hill 2014 CTC Retreat - Creating epub e books
UNC Chapel Hill 2014 CTC Retreat - Creating epub e booksUNC Chapel Hill 2014 CTC Retreat - Creating epub e books
UNC Chapel Hill 2014 CTC Retreat - Creating epub e booksJonathan Pletzke
 
The Big Documentation Extravaganza
The Big Documentation ExtravaganzaThe Big Documentation Extravaganza
The Big Documentation ExtravaganzaStephan Schmidt
 
Balisage - EXPath - A practical introduction
Balisage - EXPath - A practical introductionBalisage - EXPath - A practical introduction
Balisage - EXPath - A practical introductionFlorent Georges
 
Balisage - EXPath Packaging
Balisage - EXPath PackagingBalisage - EXPath Packaging
Balisage - EXPath PackagingFlorent Georges
 
www.webre24h.com - [O`reilly] html and xhtml. pocket reference, 4th ed. - [...
www.webre24h.com - [O`reilly]   html and xhtml. pocket reference, 4th ed. - [...www.webre24h.com - [O`reilly]   html and xhtml. pocket reference, 4th ed. - [...
www.webre24h.com - [O`reilly] html and xhtml. pocket reference, 4th ed. - [...webre24h
 
คำศัพท์File
คำศัพท์Fileคำศัพท์File
คำศัพท์FilePasuda Jiasaram
 
The Case for Authoring and Producing Books in (X)HTML5
The Case for Authoring and Producing Books in (X)HTML5The Case for Authoring and Producing Books in (X)HTML5
The Case for Authoring and Producing Books in (X)HTML5Sanders Kleinfeld
 
2010 Glossary of E-Publishing Terms
2010 Glossary of E-Publishing Terms2010 Glossary of E-Publishing Terms
2010 Glossary of E-Publishing TermsKrista Coulson
 
AAUP 2011 Ebook Basics Introduction/Handout
AAUP 2011 Ebook Basics Introduction/HandoutAAUP 2011 Ebook Basics Introduction/Handout
AAUP 2011 Ebook Basics Introduction/Handoutearkin
 
EPUB - a workshop for beginners
EPUB - a workshop for beginnersEPUB - a workshop for beginners
EPUB - a workshop for beginnersBeat Oderbolz
 
06 xml processing-in-.net
06 xml processing-in-.net06 xml processing-in-.net
06 xml processing-in-.netglubox
 

Similaire à From XML to eBooks Part 2: The Details (20)

Kernel Recipes 2016 - Kernel documentation: what we have and where it’s going
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s goingKernel Recipes 2016 - Kernel documentation: what we have and where it’s going
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s going
 
Down and Dirty EPUB 3
Down and Dirty EPUB 3Down and Dirty EPUB 3
Down and Dirty EPUB 3
 
You Want to Go XML-First: Now What? Building an In-House XML-First Workflow -...
You Want to Go XML-First: Now What? Building an In-House XML-First Workflow -...You Want to Go XML-First: Now What? Building an In-House XML-First Workflow -...
You Want to Go XML-First: Now What? Building an In-House XML-First Workflow -...
 
Open Writing! Collaborative Authoring for CloudStack Documentation by Jessica...
Open Writing! Collaborative Authoring for CloudStack Documentation by Jessica...Open Writing! Collaborative Authoring for CloudStack Documentation by Jessica...
Open Writing! Collaborative Authoring for CloudStack Documentation by Jessica...
 
Open writing-cloud-collab
Open writing-cloud-collabOpen writing-cloud-collab
Open writing-cloud-collab
 
16 wordprocessing ml subject - odds and ends
16   wordprocessing ml subject - odds and ends16   wordprocessing ml subject - odds and ends
16 wordprocessing ml subject - odds and ends
 
Building bridges - Plone Conference 2015 Bucharest
Building bridges   - Plone Conference 2015 BucharestBuilding bridges   - Plone Conference 2015 Bucharest
Building bridges - Plone Conference 2015 Bucharest
 
UNC Chapel Hill 2014 CTC Retreat - Creating epub e books
UNC Chapel Hill 2014 CTC Retreat - Creating epub e booksUNC Chapel Hill 2014 CTC Retreat - Creating epub e books
UNC Chapel Hill 2014 CTC Retreat - Creating epub e books
 
The Big Documentation Extravaganza
The Big Documentation ExtravaganzaThe Big Documentation Extravaganza
The Big Documentation Extravaganza
 
Balisage - EXPath - A practical introduction
Balisage - EXPath - A practical introductionBalisage - EXPath - A practical introduction
Balisage - EXPath - A practical introduction
 
Balisage - EXPath Packaging
Balisage - EXPath PackagingBalisage - EXPath Packaging
Balisage - EXPath Packaging
 
www.webre24h.com - [O`reilly] html and xhtml. pocket reference, 4th ed. - [...
www.webre24h.com - [O`reilly]   html and xhtml. pocket reference, 4th ed. - [...www.webre24h.com - [O`reilly]   html and xhtml. pocket reference, 4th ed. - [...
www.webre24h.com - [O`reilly] html and xhtml. pocket reference, 4th ed. - [...
 
คำศัพท์File
คำศัพท์Fileคำศัพท์File
คำศัพท์File
 
Files in php
Files in phpFiles in php
Files in php
 
The Case for Authoring and Producing Books in (X)HTML5
The Case for Authoring and Producing Books in (X)HTML5The Case for Authoring and Producing Books in (X)HTML5
The Case for Authoring and Producing Books in (X)HTML5
 
2010 Glossary of E-Publishing Terms
2010 Glossary of E-Publishing Terms2010 Glossary of E-Publishing Terms
2010 Glossary of E-Publishing Terms
 
AAUP 2011 Ebook Basics Introduction/Handout
AAUP 2011 Ebook Basics Introduction/HandoutAAUP 2011 Ebook Basics Introduction/Handout
AAUP 2011 Ebook Basics Introduction/Handout
 
EPUB - a workshop for beginners
EPUB - a workshop for beginnersEPUB - a workshop for beginners
EPUB - a workshop for beginners
 
06 xml processing-in-.net
06 xml processing-in-.net06 xml processing-in-.net
06 xml processing-in-.net
 
Learning XSLT
Learning XSLTLearning XSLT
Learning XSLT
 

Dernier

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 

Dernier (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

From XML to eBooks Part 2: The Details

  • 1. From XML to eBooks Part II: The Devil is in the Details Richard Hamilton XML Press hamilton@xmlpress.net
  • 2. Slight Recap For most tech comm situations: ● Two formats matter: ePub & Kindle ● XML processes (esp., Docbook or DITA) will make things much easier ● Content strategy is the hardest part ● Authoring is next hardest ● Production is tough, but doable ●Distribution is easiest
  • 3. Overview ● Authoring ● Storing and managing content ●Producing output Content Strategy is critical, but not for this presentation
  • 4. Authoring Authoring formats at XML Press: ● DocBook XML: 5 books ● DITA XML: 2 books (so far) ● Word: 4 books ● Wiki (Confluence): 1 book ● Wiki (pbworks): 3 books ● Author-it: 1 book ● InDesign: 1 book All but 3 (1 each in Word, InDesign, & Author-it) were ultimately produced from XML
  • 5. Authoring in a Wiki ● Based on PBWorks ● Authors, editor, reviewers, indexers, work in wiki ● Parallel access throughout most of the process ● Content exported for proofs as needed ● Content moved to SVN for final production Requires a clean, clear breaking point where content moves from wiki to SVN
  • 6.
  • 7.
  • 8. HTML to XHTML Tidy  Convert  Cleanup
  • 9. Pre-process XHTML XSL Stylesheet  Remove empty elements  Normalize  Handle headings
  • 10. Convert to DocBook Herold  Infer hierarchy  Convert  Define structure
  • 11. Process Supplemental Markup Perl script  Index entries  Footnotes  Endnotes  Sidebars  Epigraphs  Block quotes  Convert all to DocBook
  • 12. Supplemental Markup Indexing: {in primary; secondary; tertiary} {id term 1; term 2} {is id; primary; secondary; tertiary} {ie id} {is term; see term} Footnotes, sidebars, etc. {if footnote text} {ib sidebar text} {ip epigraph text;;attribution;;source} {it endnote text} {iq quotation;;attribution}
  • 13. Cleanup XSL stylesheet  Handle links  Validate structure
  • 14. What about Confluence? Confluence,Tech Comm, Chocolate used K15t Software's DocBook export plugin, which also handles much of what the supplemental markup handles.
  • 15. Storing and managing content Content has one home, but... ●That home can change at certain well-defined points ●For XML, SVN is the home ●For wiki, the wiki is the home until production, then SVN is the home ●Home changes once, irrevocably ●All production comes from SVN
  • 16. ePub Structure Top Level Directory mimetype (file) OEBPS META-INF Application/epub_zip (folder) (folder) Identifies this as an ePub file container.xml (file) (next page) Points to package file in ePub file is simply a zip file of this OEBPS folder. structure, with mimetype as first file in the zip. Uses .epub suffix.
  • 17. Ebook production - DocBook OEBPS Directory Contents OEBPS (folder) OPF file package.opf Navigation file toc.ncx CSS file xyz.css ch01-toc.xhtml HTML TOC figure.jpg ch01.xhtml Media screen.png ch01s02.xhtml ... HTML Content ch01s03.xhtml … chXX.xhtml Notes: This folder is like any website ●Names are arbitrary ●Sub-folders ok
  • 18. NCX View in Kindle Button for NCX view in emulator
  • 19. Ebook production - DocBook OEBPS Directory Contents OEBPS (folder) OPF file package.opf Navigation file toc.ncx CSS file xyz.css ch01-toc.xhtml HTML TOC figure.jpg ch01.xhtml Media screen.png ch01s02.xhtml ... HTML Content ch01s03.xhtml … chXX.xhtml Notes: This folder is like any website ●Names are arbitrary ●Sub-folders ok
  • 20. OPF (Open Packaging Format) <package ...> <metadata ...> … Dublin Core Metadata elements … </metadata> <manifest> } Metadata } <item id=”ncx” media-type=”application/x-dtbncx+xml” href=”toc.ncx”/> <item id=”toc” media-type=”application/xhtml+xml” What's in href=”ch01-toc.xhtml”/> the ePub? <item id=”ch01” media-type=”application/xhtml+xml” href=”ch01-toc.xhtml”/> … } </manifest> <spine toc=”ncx”> What order <itemref idref=”cover”/> is it in? <itemref idref=”toc”/> … } </spine> <guide> Where do <reference type=”text” title=”Startup page” href=”ch01.xhtml”/> you start? </reference> </guide> </package> Change starting place
  • 21. Other tweaks to XHTML ● Remove empty paragraphs (vestige of wiki past) ● Remove <p> around first para after an <li> (for original Kindle) ● Work around a few epubcheck anomalies
  • 22. ePub/Kindle from DocBook ● Based on open-source DocBook stylesheets ● ePub3 transform by Bob Stayton ● CSS added ● A few minor tweaks for personal preference ● Kindle (.mobi) produced using kindlegen ● Amazon tests .mobi and converts to smaller file
  • 23. Generating ePub from DocBook DocBook XSL  ePub3 transform  Based on HTML5 transform Generates all ePub3 files
  • 24. Generating ePub from DocBook File cleanup  Adjust .opf file  Clean up XHTML
  • 25. Generating ePub from DocBook File preparation  Copy images  Copy in CSS file  Run zip to create .epub file
  • 26. ePub/Kindle from DITA ● Based on DITA Open Toolkit and DITA for Publishers toolkit extensions (developed by Eliot Kimber) ● Does not require content to use DITA for Publishers specialization. ● Generates ePub2 compliant files ● Kindle (.mobi) produced using kindlegen ● Amazon tests .mobi and converts to smaller file
  • 27. Thanks for listening Richard Hamilton XML Press hamilton@xmlpress.net