Tuesday, August 31, 2010

EPUB Development - The Details

Connexions recently introduced EPUB files for Collections and Modules. EPUB files are electronic books (ebooks) that can be read on mobile devices, dedicated electronic book readers, and on computers running ebook reader (eReader) software. Creating the EPUB files had a unique set of problems and solutions.

Our content is stored as XML files at the Collection (collxml) and Module (cnxml) levels. To convert a Collection to an EPUB, we first retrieve the Collection and Module XML files. Then the files are converted to Docbook and then into HTML which is used to create the EPUB files. Docbook already had a Docbook-to-EPUB transformation that generates the EPUB specific configuration files and directory structures. The Docbook-to-EPUB transformation is geared toward standard books, not textbooks. It was lacking some transformations for biblographic information, annotations and more detailed footnotes. Linkage between Exercises and Solutions was also added. All of this is triggered by a Ruby script that was originally from Docbook.

Because of the small screen size that EPUBs are usually displayed on, a new stylesheet had to be created. The new style has less indentation so the text uses all of the available screen area. Boxes around notes and examples were also modified to keep the separation of the content, but to limit the space taken up by the box.

We tested the EPUBs on a variety of devices. The initial developer testing was done using the ebook reader in Calibre or the Firefox EpubReader plugin. Our formal testing used an iPad and an iPod Touch. We tested using Stanza and iBooks on both devices. More recently, we have tested on the enTourage eDGe. Some outside testing was performed by "friends and family" on a variety of platforms including iPads, iPhones, iPod Touch and Droid phones.

There were several challenges in creating the EPUB files:
  • Handling MathML: EPUB readers do not support MathML, so the obvious solution was to convert simple MathML to text and convert all of the more complex MathML to images. We first tried converting to SVG, but found that although SVG is part of the EPUB spec, it is not supported by ebook readers. Converting to PNG solved the problem. It is supported by all of the readers.
  • Cover Images: Book cover images were needed for the "bookshelf" display in ebook readers. The image needed to be customized with the title for each EPUB and needed to be generated when the EPUB was created. Our team created an SVG image of the cover. When an EPUB is created, the title of the EPUB is added to the SVG and it is converted to a PNG image for the cover. The same image is used as part of the title page of the EPUB. The image is created using part of the as-yet-unreleased SVG1.1 spec for automatic text reflow (using Inkscape).
  • Metadata: CNXML had a problem with Module metadata that was corrected in version 0.7. However, most of our content is still stored as CNXML 0.5. We solved this by creating a CNXML 0.7 version of the CNXML for every module as part of the Collection source zip and the Module source zip. The EPUB generation uses these source zip files to retrieve the XML. This upgraded version of the CNXML allowed access to the needed metadata and will allow developers to use the source zip files without having to be concerned about the version of the CNXML.
  • EPUB Limitations: As we have tested our EPUB files, we discovered some limitations with the EPUB format and readers. Since Connexions authors have not been entering content with EPUBs in mind, some content does not display correctly as an EPUB. We created a set of author guidelines that discusses the limitations found and offers some suggestions for entering content that will display correctly as an EPUB.
  • Offline HTML Zip: There have been numerous requests for a downloadable HTML version of Modules and Collections. Creating EPUBs required the creation of a new HTML version of the content, so it seemed like a good time to offer an Offline HTML. We soon discovered that what looked good in an EPUB reader does not always look good in a browser. Our original thought was to reuse the HTML generated for the EPUB, but that was abandoned for a separate HTML generation that corrected the problems. See the downloads help page for more information on the files available.
The EPUB files are an important step in allowing Connexions users to have their content easily available on multiple platforms. We welcome any feedback so we can continue to improve them.


  1. Congratulations to what looks to be some fine work!

    I'm doing something similar for the European Environment Agency's Plone portal (http://eea.europa.eu). Basically we want to be able to provide ePub versions of our articles and also the ability to import an ePub file and have a folder/document structure generated from the chapters.

    Our work is available as the eea.epub product:

    Do you see any collaboration possibilities?

    Per Thulin

  2. @Per - Our EPUB code is specific to our application since it uses CNXML and CollXML as the starting point. However, it is open source (LGPL license) and you are free to use all or part of it. You can browse the source at http://bit.ly/b5i27U or download it at https://software.cnx.rice.edu/svn/rhaptos/packages/Products.RhaptosPrint/trunk/Products/RhaptosPrint/epub