Why do ebooks even need indexes? Can’t the user just use Search?
Unlike fiction, whose main purpose is to entertain the reader, nonfiction books are focused on conveying useful information (in other words, knowledge acquisition or learning). The index functions as a learning tool by helping the user locate specific information efficiently within the book, as well as providing an organized outline of the book’s contents. In both print and ebooks, the table of contents (TOC) provides some breakdown of the contents, but with inadequate detail and reflecting the sequential structure of the book. With ebooks, we have an additional method for locating information quickly: the Search feature. Search is a powerful tool when the reader is looking for a specific nugget of information, the proverbial “needle in the haystack.” However, if the ebook user is interested in an important subtopic, searching for that subtopic may result in hundreds of hits that have to be sifted through, one at a time. A well-designed index, on the other hand, will include important subtopics within an alphabetical list of headings. The user can easily locate the specific subtopic in the index, where it is broken down in to a limited number of manageable “chunks” (subheadings). The user scans the subheadings to locate (via page locator or hyperlink) the desired pertinent information/concept. This enhanced findability and organized presentation of subtopics is an essential feature of indexes in non-fiction books that Search does not provide.
Indexes in ebooks sound like a good idea in theory, but how do we make this a reality within our time and budgetary constraints?
The cost and time involved in adding hyperlinked indexes to ebooks can range from practically nothing (or a nominal cost) with no added time to develop, to very expensive with a significant time requirement. Fortunately, most ebook indexes can be created for a modest cost and time commitment. Let’s look at three scenarios involving different workflows.
What about a traditional workflow with an ebook output? Where does the digital index come in?
The reality is that most non-fiction books, and the corresponding digital versions, are still produced with a traditional print-book workflow: Manuscript → Desktop Publishing (DTP) → PDF and EPUB/MOBI. The index is included as a “chapter” in the DTP file, does not contain any special coding, and can be exported to the EPUB as regular text. There are software tools available that can make the index “go live” in the EPUB for a very modest cost, but the drawback is that the index locators hyperlink to what corresponds to the top of the printed page (even though page numbers may not even exist in the digital file). Because this is a “quick and dirty” method for creating a functional hyperlinked index, it represents the minimum acceptable standard for all non-fiction ebooks. But at least it’s a functional index. There is just no excuse for the print-version index to be “missing in action” in the digital version. If we are expecting non-fiction readers to migrate to digital formats, we shouldn’t ask them to make do without the index.
What about a better quality index workflow?
To create a better quality ebook index that hyperlinks to specific locations within the text (not just to the “page” level), anchors need to be added to the text file itself. This means either giving the indexer exclusive access to the DTP file (usually Adobe InDesign) for a period of time, so that the anchors can be added manually, or having the indexer index to specific uniquely identified (numbered) elements in the file. Having the indexer add encoded anchors manually is appropriate for some projects, but in most cases requires a significant time and money commitment. Tools utilized in conjunction with Adobe InDesign can automate tagging, thus reducing time and cost. Alternatively, unique element IDs can be added by the publisher before the book is delivered to the indexer for indexing. Each element (i.e., paragraph, heading, figure, etc.) is assigned a unique ID, and the indexer indexes “to” the ID number rather than to the page. When the index manuscript is returned to the publisher, the ID numbers in the index and the text can be converted into hyperlinks. This method requires some additional steps by the publisher, but once it is incorporated into the workflow, should not add significantly to production time and costs.
What about a digital-first workflow?
Some publishers have adopted a digital-first (aka XML-first) workflow: Manuscript in XML → ebook/print/HTML output. The book is finalized in the XML format, and can then be exported on demand to Print, EPUB, HTML, or other formats. As with DTP file indexing process mentioned above, the indexer must manually insert anchors into the file unless the publisher’s XML can add unique-ID numbering automatically. The process is roughly the same, but the advantage is that the index is “done” in the XML stage and does not have to be converted to another format later on down the road. In digital-first workflows, the digital version really is “first,” and not just a stepchild of the print book.
What is the potential of ebook indexes?
A precision hyperlinked index that directs the user to specific locations in the text presents a valuable tool. Beyond this, however, digital indexes have the ability to offer much more. Any hyperlinked index, because it consists of hyperlinks that connect index entries with specific points in the text, has the potential to become an embedded index, where the index entries (tags) are located within the text file. Embedded indexes allow the index to be “recreated” from the text file, or part of the text file (a chapter, for example). Imagine combining several chapters from different books and having a new index generate itself! This is possible with embedded indexes. The embedded index terms can also be used as essential metadata about the book that can increase the book’s “findability” by potential purchasers.
I want to move forward with ebook indexes. Where do I begin?
Here are some tips to get started:
- Utilize the talents of a skilled indexer to create a great index for your print AND digital books.
- Assemble a team that includes a production expert, a programmer, and an indexer to come up with a new or modified workflow that will allow you to include high-quality indexes in your digital products, while respecting time and budgetary constraints.
- “Try out” the new workflow on a couple of pilot projects, before implementing changes company-wide.
Copyright 2017 Stephen Ingle, WordCo Indexing Services
Checklist for Ebook Indexing
- Contact the indexer at beginning of project.
- Give the indexer a rundown of software tools to be used in the project.
- Give the indexer information about any conversion houses or post-processing for the ebook if conversion is not done in-house.
- Give the indexer an idea of what outputs will be created from the files: print, ebook, pdf files, web materials, chapters reused in other publications, etc.
- Give the indexer an idea of stages in the process where file handoffs for indexing or special tools can be run by the indexer.
- Give the indexer an idea of any constraints on the project (budgetary, schedule, tool issues, translations, etc.).
- Decide how close the hyperlinked entries should land on the ebook page: pin-pointed to the sentence, pin-pointed to the paragraph with the sentence, or pin-pointed to the top of what is the printed page.
- Assign a liaison to work with the indexer for tool decisions, testing and troubleshooting issues. This liaison should also be connected to in-house production, as well as any conversion houses or post-processing agencies that will be used.
- Ask for an estimate of the time needed to perform the work with the chosen tools, but be aware time frames will need to account for troubleshooting if this is the first project using a particular set of tools.
- Allow time in the schedule for test conversions and ensure that testing is done before beginning index coding for the full ebook.
In the long run, publishers can be planning ahead for ebook indexing in every project. The EPUB 3.1 Standard includes a specification for EPUB indexes that allow for new interactivity and new interfaces that can make use of index markup in ebook files. Establishing some practices department-wide will make projects ready for the new features.
What can be done now by publishers:
- Read through the EPUB 3.1 Standard to gain an understanding of where ebook indexing is headed.
- Investigate the use of scripts and anchor IDs in EPUB 3.1.
- Develop an anchor ID scheme, and add IDs to ebook files to be ready for EPUB 3.1.
- Put in anchor codes at the paragraph level or sentence level for index entries.
- Include active indexes as chapters.
- Look at interactive index interfaces to be ready for developments in reader or app support.
- Plan for re-use of metadata: wikis, handhelds, print, web pages.
- Advocate for more advanced reader software on ebook devices.
Copyright 2017 Jan Wright, Wright Information Indexing Services
Additional resources on indexes for ebooks
“Indexes in Ebooks,” Steve Ingle, EPUBSecrets blog, July 23, 2015.
“Executive Summary For Publishers: Indexes in Ebooks,” David Ream, 2014.
- “Visualizing Back-of-Book Indexes,” Ceilyn Boyd and Mitch Wade, The Indexer, 2012, vol. 30, no. 1, pp. 25–37.
- “Missing Entry: Wither the Ebook Index?,” Peter Meyers, A New Kind of Book blog, September 2, 2011.
- “Kindle and the Index,” James Lamb blog, May 1, 2011.
- “Ebook Indexes and User Interface Features,” Joe Wikert, Joe Wikert’s Digital Content Strategies blog, June 2010.
- “What Is Wrong with Full Text Searches,” James Lamb blog, February 10, 2004.
- “Matrix Revolutions: Ebook Indexing,” Pilar Wyman, eBookcraft 2016, Toronto, ON, Canada.
- “Ebook Indexes: Changes are Happening Fast,” Jan Wright.
Videos and podcasts:
- Tools of Change 2012 Proposal, Jan Wright, August 26, 2011.
- We Don’t Need Indexes in Ebooks, Right? video interview with Kevin Broccoli, O’Reilly Tools of Change for Publishing, March 21, 2012.
- Content Matters: Search Can’t Replace a High-Quality Index, Kevin Broccoli, O’Reilly Tools of Change for Publishing podcast, March 28, 2012.
- Ebook Indexing, Jan Wright, Ebook Ninjas podcast episode 68, May 8, 2012.