LavaCon 2012 presentation about creating eBooks from DocBook XML. This presentation provides details of the XML Press process for creating eBooks. A companion presentation (From XML to eBooks Part 2: Overview) is an introduction.
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
From XML to eBooks Part 2: The Details
1. From XML to eBooks
Part II: The Devil is in the Details
Richard Hamilton
XML Press
hamilton@xmlpress.net
2. Slight Recap
For most tech comm situations:
● Two formats matter: ePub & Kindle
● XML processes (esp., Docbook or DITA)
will make things much easier
● Content strategy is the hardest part
● Authoring is next hardest
● Production is tough, but doable
●Distribution is easiest
3. Overview
● Authoring
● Storing and managing content
●Producing output
Content Strategy is critical,
but not for this presentation
4. Authoring
Authoring formats at XML Press:
● DocBook XML: 5 books
● DITA XML: 2 books (so far)
● Word: 4 books
● Wiki (Confluence): 1 book
● Wiki (pbworks): 3 books
● Author-it: 1 book
● InDesign: 1 book
All but 3 (1 each in Word, InDesign, & Author-it)
were ultimately produced from XML
5. Authoring in a Wiki
● Based on PBWorks
● Authors, editor, reviewers, indexers, work in wiki
● Parallel access throughout most of the process
● Content exported for proofs as needed
● Content moved to SVN for final production
Requires a clean, clear breaking point
where content moves from wiki to SVN
14. What about Confluence?
Confluence,Tech Comm, Chocolate used K15t
Software's DocBook export plugin, which
also handles much of what the
supplemental markup handles.
15. Storing and managing content
Content has one home, but...
●That home can change at certain
well-defined points
●For XML, SVN is the home
●For wiki, the wiki is the home until
production, then SVN is the home
●Home changes once, irrevocably
●All production comes from SVN
16. ePub Structure
Top Level Directory
mimetype (file)
OEBPS META-INF
Application/epub_zip
(folder) (folder)
Identifies this as an ePub file container.xml (file)
(next page)
Points to package file in
ePub file is simply a zip file of this OEBPS folder.
structure, with mimetype as first
file in the zip. Uses .epub suffix.
17. Ebook production - DocBook
OEBPS Directory Contents
OEBPS
(folder)
OPF file package.opf
Navigation file toc.ncx
CSS file xyz.css
ch01-toc.xhtml HTML TOC
figure.jpg ch01.xhtml
Media screen.png ch01s02.xhtml
... HTML Content
ch01s03.xhtml
…
chXX.xhtml
Notes:
This folder is like any website ●Names are arbitrary
●Sub-folders ok
18. NCX View in Kindle
Button for NCX view
in emulator
19. Ebook production - DocBook
OEBPS Directory Contents
OEBPS
(folder)
OPF file package.opf
Navigation file toc.ncx
CSS file xyz.css
ch01-toc.xhtml HTML TOC
figure.jpg ch01.xhtml
Media screen.png ch01s02.xhtml
... HTML Content
ch01s03.xhtml
…
chXX.xhtml
Notes:
This folder is like any website ●Names are arbitrary
●Sub-folders ok
20. OPF (Open Packaging Format)
<package ...>
<metadata ...>
… Dublin Core Metadata elements …
</metadata>
<manifest>
} Metadata
}
<item id=”ncx” media-type=”application/x-dtbncx+xml”
href=”toc.ncx”/>
<item id=”toc” media-type=”application/xhtml+xml” What's in
href=”ch01-toc.xhtml”/> the ePub?
<item id=”ch01” media-type=”application/xhtml+xml”
href=”ch01-toc.xhtml”/>
…
}
</manifest>
<spine toc=”ncx”> What order
<itemref idref=”cover”/> is it in?
<itemref idref=”toc”/>
…
}
</spine>
<guide>
Where do
<reference type=”text” title=”Startup page”
href=”ch01.xhtml”/> you start?
</reference>
</guide>
</package> Change starting place
21. Other tweaks to XHTML
● Remove empty paragraphs (vestige of wiki past)
● Remove <p> around first para after an <li> (for
original Kindle)
● Work around a few epubcheck anomalies
22. ePub/Kindle from DocBook
● Based on open-source DocBook stylesheets
● ePub3 transform by Bob Stayton
● CSS added
● A few minor tweaks for personal preference
● Kindle (.mobi) produced using kindlegen
● Amazon tests .mobi and converts to smaller file
23. Generating ePub from DocBook
DocBook XSL
ePub3 transform
Based on HTML5
transform
Generates all
ePub3 files
25. Generating ePub from DocBook
File preparation
Copy images
Copy in CSS file
Run zip to
create .epub file
26. ePub/Kindle from DITA
● Based on DITA Open Toolkit and DITA for
Publishers toolkit extensions (developed by Eliot
Kimber)
● Does not require content to use DITA for
Publishers specialization.
● Generates ePub2 compliant files
● Kindle (.mobi) produced using kindlegen
● Amazon tests .mobi and converts to smaller file