Embedded Index Information for Publishers – Digital Publications Indexing

What is an embedded index?

An embedded index consists of a set of tags embedded into the document which can be used to generate the index (i.e., an alphabetical list of terms with page locators). Each embedded index code or tag includes information on the index heading and subheading(s), and its specific location within the text. Embedded entries allow text on the page to be reflowed to another page; the generated index’s page locators are updated as the text changes. Embedded indexes and hyperlinked indexes are not the same: an embedded index does not have to be hyperlinked, and a hyperlinked index is not necessarily embedded.

Embedded indexes pre-date hyperlinked indexes. Two well-known examples of software that includes embedded index tags are Microsoft Word and Adobe InDesign. The original rationale was as follows: 1) the index can be prepared earlier in the workflow before page numbers (folios) are final; 2) updated versions can utilize existing index codes or tags; 3) customized books can be created without having to re-index the contents (coded or tagged parts of one book—e.g., a chapter—can be combined with part of another book to allow a new index to be generated); and 4) books can be translated and the index codes or tags still target the right paragraph, even though the text itself in the new language may be longer or shorter than in the original.

Embedded indexes have been around for several decades. Why aren’t they more prevalent?

There are several (at least potential) drawbacks to embedded indexes that have impeded their widespread adoption by the publishing industry:

1. Creating any useful index takes significant time and effort. Manually embedding index tags at specific locations on the page, in addition to writing the index, can easily double the time required to create the index (although the time can be shortened somewhat by using macros and specialized embedding tools that work with DTP software). Consequently, production costs are higher, while budgets often are not. Since the vast majority of nonfiction books are not revised and updated, the extra cost usually cannot be justified.

2. The tags must be error-free in order to generate the index. If any of the tags are inadvertently corrupted or deleted during the book production process, the index may not generate correctly. For example, if an index entry refers to a span of pages, there may be “begins here” and “ends here” tags. If an editor later removes a text section that contains the “ends here” tag, that index entry will be affected. The result can be an interrupted workflow and troubleshooting with its associated costs.

3. Indexes are usually “book-specific.” They reflect the author’s word usage, and may not correspond to more generally accepted terms. If several chapters of one book containing embedded index tags are combined with chapters from another book to create a new custom book, the two partial indexes that have been combined to automatically generate the new merged index may not “mesh.” At the very least, the new index will need to be edited to make it usable.

4. Editing index codes/tags after the fact can be very time-consuming for the indexer or editor. Depending on the method used, each affected tag may have to be edited individually. Additionally, unless the indexer can use standalone indexing software in conjunction with index tagging (more on this below), index editing within DTP software in general can be painstakingly slow.

This all sounds like a lot of work for a limited set of benefits. Is there any hope for embedded indexes?

Yes! While embedded indexes may not be necessary for all indexed nonfiction books, they can be extremely valuable in books that will be repurposed or frequently updated (for example, an introductory accounting textbook or an annual tax guide). They can also be used to produce hyperlinked indexes for digital versions. (For more information on hyperlinked indexes, see “Ebook Index Information for Publishers.”) Fortunately, there are ways to make the job of creating embedded indexes easier for the indexer.

How do we make the indexer’s tagging process easier and thus more cost-effective?

The key is to minimize the “grunt” work of tagging so that the indexer can focus on what they are really good at, i.e., creating a well-constructed and useful index with the help of dedicated computer-assisted indexing software. This can be accomplished in two ways:

1. By combining specialized indexing macro tools with desktop publishing systems that support embedding (Word, InDesign, FrameMaker). There are several macro products available that allow the indexer to use their own standalone indexing software package to develop and edit the index, and then use the macros to embed the indexing codes into the DTP files. This allows a much quicker workflow and a higher quality index, as the index is finalized before it is encoded in the DTP files. Another option is utilization of indexing tools such as Index-Manager which allow the indexer to tag text and construct the index simultaneously.

2. By automatically inserting numerically unique IDs within the document (for each element—heading, paragraph, table, and figure—), to any desired degree of granularity. The indexer can then index “to” the ID number, just as they would normally index to the page number. By utilizing the index file (often in a delimited spreadsheet format), the indexer or the publisher can “swap out” the relevant ID numbers in the text with the corresponding index tags containing information on index heading and subheading(s).

In both methods, incidentally, the index entries created by the indexer are now connected to specific locations within the document, and it is a relatively simple matter to make the index hyperlinked.

What about workflows?

A distinction needs to be made between traditional print workflows (Manuscript → Desktop Publishing (DTP) → PDF/print and EPUB/MOBI) and digital-first workflows (XML → ebook/print/HTML). The latter workflow, while less common, is (in theory, anyway) more robust than the traditional approach. XML-first allows more flexibility in generating non-print forms of the publication since the content is stored in a more generic form than laid-out DTP files. However, to implement this workflow and have it function smoothly, the publisher needs a staff willing and able to embrace a radically different work process, where the focus is on various end formats, and not just the traditional laid-out print version. Additionally, such a workflow needs to be planned, and templates and XML schemas needed for the planned outputs have to be developed. While this is challenging, some publishers have been very successful working with an XML-first work process.

Whichever workflow is used, it takes some coordination between the publisher and the indexer to achieve success with embedded indexes. The publisher should consult with the indexer before the project is in full swing. For long-term workflow changes, the publisher should assemble a team comprised of an editor, an indexer, and if using the digital-first scenario, a programmer. Enough time must be allowed to test the process to make sure the embedded index tags work as required.

As we look forward to more non-print, modular forms (textbooks, for example), embedded XML index tags offer broad possibilities. Imagine that the user of a modular textbook could customize the index display so that only certain categories of index headings are shown. Perhaps the user of a book on entrepreneurship wishes to view only people’s names with the other index entries suppressed. Or maybe they should appear in red. If the relevant index tags have been marked, and the reading device/app (a future iPad?) is capable, such a scenario is not only possible; it could give that publisher a truly competitive edge. We’re not there yet. But if we can dream big AND transcend the logistical hurdles, the future of embedded indexes is exciting!

Checklist for Embedded Indexing

Contact the indexer at beginning of project.
Give the indexer a rundown of software tools to be used in the project.
Give the indexer information about what kind of uses you are planning for the book’s files: i.e., new editions next year, reusing chapters in a smaller book, translating the book, outputting the book as an ebook with a hyperlinked index, generating a PDF file with an active index.
Give the indexer an idea of stages in the process where file handoffs for indexing or special tools can be run by the indexer.
Give the indexer an idea of any constraints on the project (budgetary, schedule, tool issues, translations, etc.).
Decide with the indexer which method of embedding will work best.
Assign a liaison to work with the indexer for tool decisions, testing and troubleshooting issues. This liaison should also be connected to in-house production, macro writers if needed, and editorial.
Ask for an estimate of the time needed to perform the work with the chosen tools, but be aware time frames will need to account for troubleshooting if this is the first project using a particular set of tools.
Allow time in the schedule for testing and ensure that testing is done before beginning index embedding for the book.
Try to ensure that the indexer’s computer setup includes the correct fonts and file paths to display the files correctly. Linked large images can be represented by low-resolution screen display. Fonts are necessary to make sure the indexer isn’t seeing extreme enlargements of text, or can see mathematical equations or formulas correctly.
Ask the indexer to provide instructions for the care and feeding of the files after indexing, including illustrations of the coded entries, tips for not deleting or altering the codes, and how to regenerate the index.
Always be sure to regenerate the index on the publishing house’s own computers, to ensure that the index codes are reflecting the final pagination with full font and image information. Educate all staff who work with the files about the index coding.

Benefits of eIndexes
Matrix Flowcharts of Software Tools, Processes, and Outputs

See also the resource documents Benefits of eIndexes and Matrix Flowcharts of Software Tools, Processes, and Outputs from “How to be the Interface between Publishers and Digital Indexing techniques,” Terry Casey, Jan Wright, and Pilar Wyman, ASI Conference, Cleveland, OH, April 28, 2018.

Checklist for Embedded Indexing

Benefits of eIndexes Matrix Flowcharts of Software Tools, Processes, and Outputs

Benefits of eIndexes
Matrix Flowcharts of Software Tools, Processes, and Outputs