Ebook Index Information for Publishers – Digital Publications Indexing

Why do nonfiction ebooks even need indexes? Can’t the user just use Search?

Nonfiction (unlike fiction, the main purpose of which is to entertain) focuses on conveying relevant and useful information to the reader (the process of knowledge acquisition or learning). An index facilitates learning by helping users quickly locate general or specific discussions, concepts, and terms within the book, as well as providing an organized and detailed synopsis of the book’s content. While an overwhelming amount content is now widely available at no cost via the Internet, publishers of nonfiction books offer their customers added value by providing them with vetted, curated content that makes learning more efficient (and even entertaining). The index is a significant component of this value.

Some publishers question the need for indexes in ebooks, since the Search feature seems to offer a viable alternative. After all, if the user wants to “look something up,” they merely type the term in the Search window, and the result appears instantaneously. Search can work extremely well when the user is looking for a specific term or name. For instance, a user search for “George Washington” in a book that presents an overview of American history may take the user right to the appropriate section or passage. But suppose the book is a detailed study of the Revolutionary War: a search for “George Washington” may result in hundreds of “hits.” In this respect, the user of an ebook which lacks the index is at a distinct disadvantage compared with the user of the print book which includes the index. A well-designed index will list important subtopics (subheadings) under the “Washington, George” index heading. The indexer has broken down the main topic (“George Washington”) into manageable chunks. The ebook user of the ebook can navigate via hyperlink to the relevant location. The organized presentation of subtopics is an essential feature of indexes that Search does not provide. If there is no ebook index, the user is forced to rely on Search with all its limitations.

It is worth noting that the table of contents (TOC) presents a rudimentary breakdown of the contents and is usually hyperlinked to the corresponding chapter or section, but the TOC provides at best a broad overview and is not a substitute for an index. The TOC, unlike the index, follows the sequential structure of the book. Whereas readers of fiction read the book sequentially, from start to finish, nonfiction book users may be interested in the book’s treatment of a certain topic. They might jump around in the book, looking for those sections that address their specific learning needs. The index, organized by topic rather than sequentially, allows the user to do this efficiently.

Indexes in ebooks sound like a great idea in theory, but the reality is that human-created indexes are increasingly problematic given the drive to “do more, faster, with less,” as well as changes in publishing workflows. How do we include a hyperlinked index and still keep costs and schedules under control?

The cost and time involved in adding hyperlinked indexes to ebooks can range from practically nothing (or a nominal cost) with no added production time, to very expensive with a substantial time requirement. Instead of focusing exclusively on cost reduction, the real question publishers should be asking is: how can a modest investment in ebook indexes significantly increase value for consumers of nonfiction books? Users of nonfiction books will be more inclined to purchase the digital version if the index, rather than absent, offers enhanced functionality over the print book.

Including basic functional hyperlinked indexes in ebooks produced via traditional print workflows, or ebooks converted from legacy print books, is really a non-argument, since the added cost and time is negligible and the benefit to the user is obvious. There are InDesign scripts that automatically hyperlink the index which has already been created for the print book, and allow the resulting hyperlinked index to be exported to EPUB. The compromise with this method is that the index locators link to what corresponds to the top of the printed page (even though page numbers may not even exist in the digital file), rather than to the precise location in the text. Regrettably, even such a basic hyperlinked index is often missing in the digital version of print books.

Here’s the good news: excellent ebook indexes can be created for a modest cost and time commitment, and (in the future) they have the potential of offering a qualitatively different and improved user experience.

The latter point is crucial, because the index can be the deciding factor for users of nonfiction books to migrate from print to digital. But before we address the potential of ebook indexes, let’s take a brief look at how ebook index creation fits into the publishing workflow.

Currently, most non-fiction books, and the corresponding digital versions, are produced in a traditional print-book workflow: Manuscript → Desktop Publishing (DTP) → PDF and EPUB/MOBI. The index is included as a “chapter” in the DTP file, does not contain any special coding, and can be exported to the EPUB as regular text. As mentioned above, there are software tools available which can make the index “go live” in the EPUB for a very modest cost.

What about a workflow for creating better quality indexes?

To create a better quality ebook index that hyperlinks to specific locations within the text (not just to the “page” level), anchors need to be added to the text file itself. This means either giving the indexer exclusive access to the DTP file (usually Adobe InDesign) for a period of time, so that the anchors can be added manually, or having the indexer index to specific uniquely identified (numbered) elements in the file. Having the indexer add encoded anchors manually is appropriate for some projects, but in most cases requires a significant time and money commitment. Tools utilized in conjunction with Adobe InDesign can automate tagging, thus reducing time and cost. Alternatively, unique element IDs can be added by the publisher before the book is delivered to the indexer for indexing, either at the manuscript or PDF stage. Each element (i.e., paragraph, heading, figure, etc.) is assigned a unique ID, and the indexer indexes “to” the ID number rather than to the page. When the index manuscript is returned to the publisher, the ID numbers in the index and the text can be converted into hyperlinks. This method requires some additional steps by the publisher, but once it is incorporated into the workflow, should not add significantly to production time and costs.

What about a digital-first workflow?

Some publishers have already adopted a digital-first (aka XML-first) workflow: Manuscript → XML file → ebook/print/HTML output. The book is finalized in XML format, and can then be exported to PDF, EPUB, HTML, or other formats. XML allows the book to be generated at any stage on demand, which is a great feature for the editor and the indexer since it allows them to “see” the final version of the book as it takes shape. As with the DTP-file indexing process mentioned above, the indexer must manually insert anchors into the file unless the publisher’s ID numbers can be automatically added to the XML file. The process is roughly the same as with print workflows, but the advantage is that the index is “done” in the XML stage and does not have to be converted to another format later on down the road. In digital-first workflows, the digital version really is “first,” and not just a stepchild of the print book.

What is the potential of ebook indexes? Let’s imagine the possibilities!

A precision hyperlinked index that directs the user to specific locations within the text presents a valuable tool. Beyond this, however, digital indexes have the ability to offer much more. Any hyperlinked index, because it consists of links that connect index entries with specific points in the text, has the potential to become an embedded index, where the index entries (tags) are located within the text itself. Embedded indexes allow the index to be “recreated” from the text file, or part of the text file (a chapter, for example). For example, several chapters from different books, could be combined and a new index generated. The embedded index terms can also be used as essential metadata about the book that increase the book’s “findability” by potential purchasers. (For more information on embedded indexes, see “Embedded Index Information for Publishers.”)

Digital indexes need not be constrained by the limits imposed on their print cousins. The potential is practically unlimited. Why not use collapsible headings, color coding, or user-controlled displays (“show me only the most significant headings” or “show me all companies mentioned in this business book”)? Since the indexer is already spending the time and effort to create the index, it requires very little additional effort for the indexer to add extra codes to certain index terms. The real challenge will be to create ebook prototypes and software that can take advantage of these possibilities so they can be rendered successfully on a major platform (such as iPad). It’s a tall order, but meeting this challenge will offer the nonfiction reader a qualitatively improved learning experience!

I want to move forward with ebook indexes. Where do I begin?

Here are some tips to get started:

Utilize the talents of a skilled indexer to create a great index for your print AND digital books.
Assemble a team that includes a production expert, a programmer, and an indexer to come up with a new or modified workflow that will allow you to include high-quality indexes in your digital products, while respecting time and budgetary constraints.
“Try out” the new workflow on a couple of pilot projects, before implementing changes company-wide.
Use your team to develop a prototype of a user-directed index.

Checklist for Ebook Indexing

Contact the indexer at beginning of project.
Give the indexer a rundown of software tools to be used in the project.
Give the indexer information about any conversion houses or post-processing for the ebook if conversion is not done in-house.
Give the indexer an idea of what outputs will be created from the files: print, ebook, pdf files, web materials, chapters reused in other publications, etc.
Give the indexer an idea of stages in the process where file handoffs for indexing or special tools can be run by the indexer.
Give the indexer an idea of any constraints on the project (budgetary, schedule, tool issues, translations, etc.).
Decide how close the hyperlinked entries should land on the ebook page: pin-pointed to the sentence, pin-pointed to the paragraph with the sentence, or pin-pointed to the top of what is the printed page.
Assign a liaison to work with the indexer for tool decisions, testing and troubleshooting issues. This liaison should also be connected to in-house production, as well as any conversion houses or post-processing agencies that will be used.
Ask for an estimate of the time needed to perform the work with the chosen tools, but be aware time frames will need to account for troubleshooting if this is the first project using a particular set of tools.
Allow time in the schedule for test conversions and ensure that testing is done before beginning index coding for the full ebook.

In the long run, publishers can be planning ahead for ebook indexing in every project. The EPUB 3.1 Standard includes a specification for EPUB indexes that allow for new interactivity and new interfaces that can make use of index markup in ebook files. Establishing some practices department-wide will make projects ready for the new features.

What can be done now by publishers:

Read through the EPUB 3.1 Standard to gain an understanding of where ebook indexing is headed.
Investigate the use of scripts and anchor IDs in EPUB 3.1.
Develop an anchor ID scheme, and add IDs to ebook files to be ready for EPUB 3.1.
Put in anchor codes at the paragraph level or sentence level for index entries.
Include active indexes as chapters.
Look at interactive index interfaces to be ready for developments in reader or app support.
Plan for re-use of metadata: wikis, handhelds, print, web pages.
Advocate for more advanced reader software on ebook devices.

Additional resources on indexes for ebooks

Web resources:

“How to be the Interface between Publishers and Digital Indexing techniques,” Terry Casey, Jan Wright, and Pilar Wyman, ASI Conference, Cleveland, OH, April 28, 2018:
- Benefits of eIndexes
- Matrix Flowcharts of Software Tools, Processes, and Outputs
Publishing Technology Group of the Society of Indexers: Information for Publishers

Articles:

“Indexes in Ebooks,” Steve Ingle, EPUBSecrets blog, July 23, 2015.
“Executive Summary For Publishers: Indexes in Ebooks,” David Ream, 2014.
“Visualizing Back-of-Book Indexes,” Ceilyn Boyd and Mitch Wade, The Indexer, 2012, vol. 30, no. 1, pp. 25–37.
“Missing Entry: Wither the Ebook Index?,” Peter Meyers, A New Kind of Book blog, September 2, 2011.
“Kindle and the Index,” James Lamb blog, May 1, 2011.
“Ebook Indexes and User Interface Features,” Joe Wikert, Joe Wikert’s Digital Content Strategies blog, June 2010.
“What Is Wrong with Full Text Searches,” James Lamb blog, February 10, 2004.

Presentations:

“Indexes for Digital Publications: The Battle of the Books,” Stephen Ingle, New England chapter of ASI/Bookbuilders of Boston/ASI Digital Publications Indexing SIG 2018 joint meeting.
“How We Read Digitally: ebookcraft 2018 research study,” Noah Genner & Monique Mongeon, BookNet Canada. [Slides 27 and 28 are of special interest to indexers.]
“Matrix Revolutions: Ebook Indexing,” Pilar Wyman, eBookcraft 2016, Toronto, ON, Canada.
“Ebook Indexes: Changes are Happening Fast,” Jan Wright.

Videos and podcasts:

Tools of Change 2012 Proposal, Jan Wright, August 26, 2011.
We Don’t Need Indexes in Ebooks, Right? video interview with Kevin Broccoli, O’Reilly Tools of Change for Publishing, March 21, 2012.
Content Matters: Search Can’t Replace a High-Quality Index, Kevin Broccoli, O’Reilly Tools of Change for Publishing podcast, March 28, 2012.
Ebook Indexing, Jan Wright, Ebook Ninjas podcast episode 68, May 8, 2012.