File Formats Used in Publishing

Introduction

Today’s digital publishing is comprised of many different file formats.  These formats usually change during the production process.  Initial manuscripts may start off in one format and are converted along the way until the final published format that the reader ends up reading.

Publishing source files are rarely given to consumers as these files can be edited such a MS Word, Page Maker, Apple Pages, or an InDesign file. These intermediary file formats are used initially until the book is ready to be published into a format that typically is read-only such as an EPUB (Electronic Publication) or PDF (Portable Document Format).

A typical workflow might look like, the book author writes the chapters making up the book in a word processor such as Microsoft’s Word, this document is sent to a Publisher who either themselves or through a conversion vendor they have partnered with, takes this Word document, and converts it into an InDesign file format where they modify the layout of how the book will finally look.  Then it is exported into the final file formats which typically could be a PDF if this book is intended on being sent to the printers to make a paper copy, or into an EPUB which is the industry standard for digital books.

Here is a list of the topmost Publishing file formats used:

Word is the most commonly used manuscript drafting tool in use today. Files are easily shared with others, and Word can also be used as a basic book-design tool to create PDFs and even EPUBs directly with the free tool WordToEPUB by DAISY. Word creates a publishing source file that can be opened by virtually any other program. A benefit of using Word is it can perform an initial accessibility check for the author.

There are many software programs that can create a nice-looking book, but InDesign has emerged as the most popular amongst publishers and conversion vendors. It continues to get better, and there is a large support community that includes training programs. Although InDesign is by far the most widely used book publishing tool it has a lot of work in terms of accessibility.  Work has started to improve the accessibility of the books published using InDesign, but this will take some time before this is complete. In the meantime, additional work will need to be done to make the exported EPUBs from the InDesign workflow fully accessible.  InDesign has an export to EPUB, although this export unfortunately has many issues with the quality of the EPUB that is produced. It requires an experienced person to edit the EPUB to make it suitable for the commercial market.

Is considered a standard professional publishing tool to design covers and create graphics. The image files it creates would be added to the original manuscript.  These images created would also need to be described using alternative text so blind readers are informed of what these graphical images are.

Most programs can create a PDF, and most computers can open the files. However, these files cannot be edited and formatted like one can do using Word. Typically, these files are primarily used for printing.  It is also important to note there are two main types of PDF’s, traditional PDFs are either image based or textual based in nature and are not very accessible.  Even a textual based PDF do not preserve the reading order so a screen reader user may end up hearing paragraph 2 before paragraph 1.  A more accessible PDF is what is called a “Tagged” PDF or PDF/UA (PDF/Universal Accessibility).

One of the Amazon Kindle eBook file formats. Amazon supports several Kindle file formats, but Mobi is the most commonly used format by self-publishers.

The industry standard eBook format because it is supported by virtually all publishers and retailers. In fact, as of 2021, Amazon suggests that self-publishers upload EPUB files for publishing Kindle eBooks. Once uploaded, Amazon converts the EPUB into their proprietary format.

PDF

PDF has historically been the primary final document format for publishers.  Once they have finalized how the book will ultimately end up looking, they export it into a PDF which can be used to print the final paper book. When readers wanted a digital version instead a PDF was typically provided. 

Since accessibility was never considered these PDFs were not accessible and readers who needed an accessible version would end up purchasing the paper version of the book chopping off its spine and scanning each page in a process called “OCR” (Optical Character Recognition) that would recognize the text on the scanned pages and create a somewhat accessible version of the book for the reader.  Bookshare by Benetech is one such collection of digital books that have these “chop-n-scanned” books.  The quality of these books is quite low as an OCR’ed book will typically have character errors from what’s recognized as well as all images are removed during the process leading to an inferior book.  Bookshare still continues this Chop-n-Scan process, if a student requests a book that Bookshare doesn’t have in its collection usually because the Publisher does not give their books to Bookshare.

DSO (Disability Service Offices) that receive a request from a disabled student will ask the publisher for a digital version of the textbook the student needs and if they receive a PDF they will have to go in and remediate the book manually adding the PDF/UA tagging needed to make the PDF accessible.

Note: since the original PDF and the more accessible PDF/UA both have the same file extension of “.pdf”, one cannot tell which version of PDF they have.  Only opening the file in a PDF reader using assistive technology will it become apparent if this PDF is accessible or not.

EPUB

EPUB is built upon Web technology. Under the hood EPUB is just a zip file containing several content documents including such as text, images, audio, and video files that are styled as Web HTML pages and linked together in a structured way.  Also contained in this EPUB container is the metadata information about what is in this book which includes important information about this EPUB such as the title, author, ISBN, copyright information, and more recently accessibility affordances this EPUB has such as any features, hazards, access modes, conformance, and any certification declaration if present.

The current version of EPUB is 3.2 and is an ISO international standard.  This work has continued in the W3C and EPUB 3.3 will soon be a W3C Recommendation which is expected to happen in 2023.

Not only is EPUBs gaining in popularity it is arguably the most accessible book format we have to date, and more and more publishers are producing fully accessible EPUBs for their readers.  There are many EPUB reading systems on the market that are highly customizable to suit the reader’s needs.

One of the leading education technology solutions providers VitalSource has supplied the following statistics on the amount of publisher submissions for EPUB over traditional PDF.  Overall, we have 60/40 EPUB/PDF historically, 70/30 over the past year, 80/20 over the past two months (prep for fall term). If you consider usage, it’s 95/5 of what is being read.  VitalSource’s Bookshelf EPUB reader is also one of the most accessible reading systems as has been tested by EPUBTest.org.

 


 

Chapter Summary

EPUB by far is the most accessible digital eBook format, which is being embraced by the industry as the global standard.  Since EPUB is built using open Web technologies, assistive technologies that work on web pages can also work on EPUB pages, thus making the content accessible out of the box. However, it’s not guaranteed that just because an eBook is an EPUB it is fully accessible. It is up to the publisher to add in accessibility enhancements to their EPUBs such as adding alt text descriptions for images, but at least the mechanism is well defined on how to do this.  So, it’s now just a matter of educating publishers that the need to add in accessibility to all the books they produce.  Programs such as Benetech’s GCA program will help publishers and conversion vendors create fully compliant and accessible EPUBs that will work for all readers no matter the assistive technology they use.

 


 

References