Boost PC performance: How more available memory can improve productivity
Multiple Formats, One Source: Supporting Multi-Format Publishing with a Single Source of Content
1. Multiple Formats, One Source: Supporting
Multi-Format Publishing with a Single Source
of Content
Alexander Berman
Content Manager
Kaplan Publishing
@Alex_Berman647 @LavaCon
10. #5
Shared Service
Graduate
programs
Pre-College
Programs
Health Programs
Bar review
Shared services
• GMAT
• GRE
• MCAT
• SAT
• AP Tests
• ACT
• PSAT
• NCLEX RN
• NCLEX PN
• USMLE Board
Tests
• Comprehensive
and Multi-State
bar review for
aspiring lawyers
• Responsible for
supporting the
SBUs .
• KPub, Tech
Support, Call
Center are all
shared
services.
Here’s our product development process through to print.Start with word document, google docs, etc., This is where the author or SME writes the content. After some initial work, the content is moved into Adobe InDesign where it’s laid out and formatted for publication. After the design has been approved and the title has been proofread (the content will switch between PDF + INDD) it’s sent to a printer, who sends us back a PDF of the document.The final product is printed and shipped to warehouses where it’s distributed to various sellers
The main starting point
This process outline is one you’ve already seen. It exemplifies the product development process from Conception to Print (simplified). I’ve included a second format (INDD ePub) for this example. So how does this exemplify too much manual work?I’m going to outline (at a basic level) all of the manual steps that need to occur within the product production process. Word to InDD- Basically the author writes the required content. It’s worked over by a copyeditor in its original format and then it’s moved into Adobe InDesign where it’s worked over by the typesetter and proofreader. This process is overseen by the production editor and is aimed at making sure that the content looks good and is grammatically sound. It’s then sent off by the production editor to the printer, who prints the PDF. The same basic process is repeated for generating an ePub. The reason the process is repeated is because ePub has different content + structural standards than print. Due to inconsistencies in our source INDD content our current production processes require that it is proofread and typeset again. Additionally, the ePub requires manual QA due to bugs that are generated as a result of Kpub’s usage of InDesignWith every new format additional manual work is added to the process, which in turn makes it harder and harder to scale product production. It should also be noted that we do have a single source of content for our multi-format production process: Adobe InDesign. However there are challenges associated with using InDesign as a source that necessitate manual, repeated work for any export. I am going to go into more detail on this subject in a few slides.
Branching Effect each branch is a one way street. Any change to the content in one branch will need to be made to ALL branches where that content used. That means that the typesetting +, if necessary proofreading, that occurs when creating a (for example) print product needs to be repeated when creating an XML based product, again for an ePub product, so on and so forth since the unaltered usage of our INDD content could lead to non-matching products across multiple formats.So let’s say that a significant error is found in 4 paragraphs in the print product. This error requires that the paragraphs be rewritten. The resulting new paragraphs are 6 lines shorter. For the print product this means that it’ll need to be re-typeset since the text won’t reflow automatically. Without redoing the reflow of the book it’s entirely possible that different components such as illustrations, graphs, and assessment content support materials (such as equation pages) will be misplaced. This is just for print. The same process of re-typesetting (at the minimum) would need to be repeated across each production “branch.” This might sound superfluous since ePub/XMlbased content will reflow automatically, but as I said before the challenge is with the our usage of InDesign. To make sure that our content presentation and structure are consistent across multiple formats repeating the typesetting , and possibly proofreading, processes is necessary. As you can imagine this kind of work is highly inefficient and time consuming since it requires so much manual repetition. Because the source content is structurally inconsistent, it is not possible to simply transform content from one format to another even if you have the appropriate plug ins. Instead all updates are made manually which in turn increases the likelihood of additional errors occurring. Basically, it’s a potentially endless loop.
As I’ve alluded to in previous slides, we’ve encountered challenges in using InDesign as a source. I want to make it clear that the challenge isn’t with Adobe InDesign, the program, but rather with Kpub’s usage of the program. When we were doing only print books HOW you got the content to look as it should wasn’t a going concern. This means that while our content generally has a consistent look and feel the backend isn’t necessarily as neat. The two main culprits (in our usage of InDesign) are character and paragraph styles: According to Adobe a character style is a collection of character formatting attributes that can be applied to text in a single step. A paragraph style combines character and paragraph formatting attributes, and can be applied to a single or multiple paragraphs. For our purposes it meant that as long as the collection of attributes produced the appropriate looking content it didn’t matter what those attributes were or what the different styles were called. It’s the same challenge as using XML or HTML without strict guidelines governing the placement and relationships of element sets. The challenge in using InDD as a source for multiple content formats is that it’s a tool designed primarily for print production. There are plug ins (official and otherwise), but they rely upon well formed InDD content since the target formats require similar well-formedness. Unfortunately, most of our InDD source content was developed before a well formed content structure was necessary. Thus it does cannot be easily transformed to multiple content formats since it does not conform to many content structure standards. What this implies is that our usage of character and paragraph styles (naming, placement, relationships, etc.,) was not consistent from title to title or even from chapter to chapter. Why? Because print book production adheres to a consistent visually based standard, NOT a content structure standard like you’d find with DITA XML or XHTML.This seems fairly straightforward, but it’s important to remember that in the late 1990s when digital producst like eBooks were first developed they seemed like a flash in the pan (and at the time, they were), but this time around the concept has stuck and is slowly, but surely gaining market share against traditional print products. What does this mean? It means that where a visually based standard was enough for print production, it is not enough for non-print products, which require content structure standards as well.It’s a classic case of kid in a candy shop. Too many pretty things (InDesign really does give you a plethora of ways to get your job done) and not enough rules leads to toothaches down the road.
Let me make it clear that print products DO utilize a content standard (INDD- visual appearance), but for non-print products that’s not enough. With non-print products content requires a standardized LOOK (visual content standard) and content structure (XML DTDs). Why?There are multiple digital formats and services leads uncertainty regarding display since variances such as screen size, resolution, hardware restrictions, user requirements, languages, formats can result in one format having different display requirements than another. So if visual requirements can change then the content itself requires standardized structure. Using XML isn’t enough though. The challenge is that, as with using a visual standard, there are potentially dozens of ways to create valid XML. For this reason, it’s super important that any XML usage is governed by a set of rules or DTD/MOD files. These govern exactly what elements can be used where, and what their relationship is to other elements and structures. This allows for repeatable content structures, which in turn allows for scalable multi-format publishing. In short, the lack of standards does not indicate an absence of standards. It means though that having a visual standard isn’t enough for non-print content. Nope, it requires additional standards (content structure standards) to ensure that the content looks as it should (no matter the platform or format) AND that the content is structured in a consistent and repeatable fashion. One other thing is that content standards also need to be enforced. Otherwise, why have them?
At Kaplan Publishing we are the primary retail facilitator between the student and the business unit. There are 4 business units + at least 4 different systems (LMS< CMS, etc.,) + varying priorities across the business units based upon customer, test, and product needs. As a shared service our goal is to support these different business units and their varying goals/priorities + while also seeking to synchronize as much as possible across these different business units. It would be easy to look at this structure and say that the solution is obvious: We have an overarching management structure why can’t they implement standardization from up above? The answer to this question is that it’s not that easy. At Kaplan Test Prep, each business unit services 4 very distinct constituents. The needs, desires, and requirements for our pre-college business unit are substantially different then those of our medical business unit. Add in the mature technical and content infrastructures utilized by each business unit and it becomes clear that the only good place to actualize change + standardization is from the shared point of contact for these business units: Kaplan Publishing. The challenge is that since each business is basically independent, but is reliant upon a single funding source they (generally speaking) have little reason to cooperate with one another. This gets to the primary challenge of being a shared service. While we might be ideally placed for actualizing change and innovation, we’re also at the mercy of the other business units since we rely upon their cooperation to actually get work done.
Once we found out that round tripping didn’t work for us we shifted our focus to implementing a “single source of truth” for our content. We acknowledged the inconsistencies in our INDD content and thought that by stripping out the visual content standards and imposing content structure standards we’d be able to still utilize our INDD content in the short term, while using the DITA content as a source for non-print products. Eventually, we hoped we’d be able to standardize the InDesign content, build the necessary transforms for multiple delivery formats, and be able to shift INDD from a source to an output. To this date, we have not successfully scaled this process as the challenges involved with moving INDD content into KTP DITA and then transforming the content proved quite challenging and time consuming. The iBooks project is a perfect example.
TheiBooks project grew out of our attempt to utilize KTP DITA and is the best example of how challenging it is to utilize inconsistently structured InDesign content as a source. Ultimately, the process for getting our content out of InDesign and into iBooks exemplifies all of the challenges I outlined in the last section: Repetition of effortAs you can see, we had to repeat many of the QA steps seen in print production: proofreading & typesetting & copyediting. The content for the iBooks was taken from already published materials (GRE 2013, NCLEX RN 2012-13, MCAT 5 Book Omnibus 3rd Edition) where all of these steps had been taken, but needed to happen again since we were moving to a different output format (than print) and our InDesign content was “standardized” according to the visual requirements of each SBU/test (Grad, Medical, GRE, MCAT, NCLEX).Too much manual workIn order to get assessment content from InDesign to iBooks we followed this path: InDesign InDD XML DITA Burst DITA. This resulted a tremendous amount of manual work that had to be done. In InDesign I implemented a styles to (XML) tags mapping and had to manually QA each file (generally a chapter) to ensure consistency since I had no guarantee that the styles were used consistently (demonstrating the challenge of using legacy content). Once the content was exported to INDD XML, I ran an XSD schema against the content (to ensure validity), manually added any elements or structures I had too (for indicating the start of a new question, question set, explanation section, etc.,) so that the XSLTs would transform the content correctly. Before I transformed the content, I would also manually create the metadata for the book and link it to the XSLTs we used to transform the INDD XML. The next step was to transform the INDD XML content to KTP DITA XML. Even if the content was properly structured in the INDD XML, I would still have some manual work to do. Additionally, DITA is founded upon the use of “topics” and “maps.” These are hierarchical structures (Maps topics topics). Since InDD exports a flat XML document we had to create a flat DITA document before utilizing a second set of XSLTs to “burst” the DITA and give it its hierarchical structure. Finally, the burst KTP DITA would be output to our Tech Lead, who further transformed the content into an interactive widget. At this point it was ready for use in the assessment widgets our iBooks utilized. It’s also important to see that this is JUST for the assessment content (practice + chapter tests) AND that non-assessment content (everything else) went through a completely different conversion pathway. So why did I go through this on a step by step basis? It was necessary to demonstrate the need for highly structured content, which is created separate from any display requirements. Even with these content structure standards I still had a whole lot of manual work to do. I would also like to acknowledge the challenge that writing the XSLTs presented for the vendor we used, Scriptorium. Simon Bate eventually left us with an excellent set of XSLTs yet they were essentially incomplete since inconsistencies in the INDD content made it impossible to programmatically anticipate all possible variations. Additionally, this process was supposed to take a few weeks, but ended up taking a couple of months. Lack of standards +reconciliation of different standardsAs I’ve stated before the absence of content structure standards (they simply weren’t necessary when our source documents were created) and their corresponding necessity in most digital formats (such as KTP DITA or ePub) meant that we had to create a new set of rules. However due to the challenge of updating all the Kpubs products (there are thousands of them) we have essentially created two classes of InDesign documents: structured for KTP DITA output ( a tiny minority) and those that are not. Shared service As a shared service for KTP we are beholden to the content requirements of the various SBUs and tests. Since Kpub (and sometimes KTP) doesn’t own the content any changes or content revisions had to be run by the content owners. The challenge of this situation is that it prevents Kpub from implementing a truly uniform content structure standard since we still had to make accomdations for each business units particular content requirements. InDesign as a SourceInconsistently applied paragraph and character styles meant that our InDesign content could not simply be extracted and transformed. It had to run through several time consuming steps such as styles to tags mapping + image anchoring, etc., in order to be used for other formats.
It all boils down to this conundrum: moving content from a limited to highly structured situation and vice versa is incredibly difficult. On the surface it seems like a relatively simple challenge: You have limited content standards because your content was built for a single purpose (Print)Thus you have a situation where you need additional standards: multi-format publishing You create the appropriate standards.Implement them onto existing content and export/transform content accordingly. As Kpub discovered it’s significantly more difficult than this. 1) In our case, “limited” structure “highly” structured meant that we had to create and implement a content structure standard that took into account structural cues from a visual content standard (InDD). 2) We then had to translate these cues into our content structure standard, 3) After this, I extracted the content into a usable form (INDD to XML) and synchronized the content by eliminating or merging any inconsistent elements uses, or placements.4) Then I had to utilize a set of transforms to get the XML extracted content (INDD XML) into a form where it could be digested by a tool. This adds another 2 steps onto the process: One to transform the content so that the appropriate XML tags were used and the proper structure approximated. And then a 2nd transform to “burst” the content and give it the hierarchical structure KTP DITA required. Along the way there were plenty of manual QA steps added into the mix.What this meant is that unless a company is able to impose a uniform content structure standard it’s almost impossible to make this process scalable due to the amount of repetitious and manual work that needs to be done. This in turn makes the possibility of creating an efficient multiple-format publishing workflow extremely low. If (in our current process) it takes (on average) 2 weeks to output (minus QA time) a single iBook, how can we expect to scale this process so that we can produce different products (ePub, iBook, HTML5, Print, etc.,) simultaneously, efficiently, and accurately? The answer is that without content standardization (multiples allowed), process streamlining, and the imposition of content structure standards (multiple allowed) you can’t expect a scalable multi-format content production process.
Competing business goals and priorities is a challenge because it means that there’s never a winner. If someone is temporarily on top, it just means that you follow their requirements til they get booted out and then you have to pivot. Pivoting is hard work.2A) Non-standardization leads to a lack of automation. A lack of automation leads to a non-scalable process. 2B) Limited resources is a couple of things. First, at Kaplan Publishing our content team is 2 people. Second, $$ is always tight. These challenges mean that more often than not you’re not able to address all potential and real challenges, only those that are most pressing.3) Multiple existing workflows is a fairly straightfoward conclusion. Different business units all have a different way of doing things. This in turn makes everything harder for the publishing team because they then have to reconcile these differences in a way that doesn’t eliminate or cover up any steps unique to a particular workflow.4) Every business unit has different requirements. Different formats have different requirements. These differences make it challenging to implement a single standard for different outputs.5) Things are changing. Not hour to hour, but quickly enough. If companies cannot keep up with these changes or just make smart bets than they run the risk of failure.
This looks very similar EXCEPT it uses XMl/HTML as the content source instead of InDD. What this means is that all of the content proof reading and copyedit would occur in one spot (XML/HTML) and would only have to occur once (or whenever content needs to be updated). Typesetting and other visually dependent factors would still occur on a format by format basis, but would be guided by strict format specific content standards. This consistency means that there’s a minimal amount of QA that needs to occur thus reducing the amount of time needed to produce products. As I’ve stressed several times, the key ingredient is consistency, reduction in manual work, and reduction in process repetition. By streamlining the process and implementing content standards + content structure standards it will be possible to create a single source of content for multiple formats. This should not be mistaken for a cure all. This consolidation does not solve all of the challenges I outlined in earlier in the presentation. For example, not all manual tasks will be automatable and this process assumes that content currently in InDesign will either be transformed to XML/HTML or, if new content has to be created, written natively in XML/HTML. What this solution does do is make addressing the challenges of legacy content and automation possible.
By utilizing a single content structure repetitious work can be eliminated or at least reduced. This in turn will lead to a more efficient process, which again isn’t a cure all, but represents a significant improvement over our existing processes. Not only does reducing repetition reduce production timelines, but it also makes content updates, errata fixes, metadata creation/implementation, etc., significantly easier. One is always easier than 2, 3, etc.,
What this all leads to is a predictable process. This in turn allows for the implementation of software (automation). Automation is reliant upon a predictable process, which requires that content have a consistent structure. Once automation reaches a certain point + quality a process can become scalable. The key to calling a process “scalable” is the total (or to an acceptable level) elimination of human input except for QA and content creation/update/removal.
Increasing process efficiency is not a panacea for all the challenges facing the publishing industry, but it does address a significant issue and blocker By introducing and enforcing content structure standards, content becomes more flexible in its application. It means that content can be used in more places, more reliably since the underlying structure is consistent throughout. No company operates in a vacuum there are always legacy content and structures to deal with. The challenge is managing the transition from these systems to your new consolidated processes. At Kaplan Publishing we’re still wrestling with this transition, but I believe, based upon my experience over the last year, that we’re well on our way to a successful transition from our legacy process, but there’s still a lot of work to be done. The ultimate takeaway from all of this is that it’s worth it. Reducing the amount of time required to WORK the content means that there’s more time to innovate or improve the content. This in turn will allow for companies to better serve their customers since consistently structured content is the baseline for being able to quickly and efficiently meeting customer needs.