Indexes on the Web – Digital Publications Indexing

Website Indexing

Website indexing, in the indexing profession, refers to creating a back-of-the-book style index on a website, with index entries and subentries hyperlinked to the relevant web pages or anchors within pages. The website index page (not to be confused with the home page, which typically has the file name of index.htm or index.html), is referred to as a “site index” or “A–Z” index. It complements the “site map” which functions somewhat as a table of contents. While search engines usually achieve desired results for the entire web (where the user merely wants some information on a topic and not everything on a topic, and there are billions of web pages indexed by web search engines), search engines on websites generally do not achieve the results that users would like. In the mid and late 1990s, some of the websites of the time adopted the feature of the site A–Z index as an additional means for users to find information on their sites. Software to aid indexers in creating website indexes was also developed in the late 1990s.

Websites suitable for such A–Z indexes would be the same size as a small-to-medium book in number of pages, from 25 to a few hundred pages. Index maintenance is an issue, so the majority of the web pages need to be static/unchanging to be suitable for an A–Z index. Website indexes are not difficult to update, but the issue is often that the skilled indexer who created the index originally is not on staff to make periodic minor updates, and it’s not worth the trouble to contract an indexer for 15–20 minutes every now and then.

Website index example: A–Z of the American Society for Indexing website.

Website indexing, however, did not become the commercial endeavor that indexers had hoped for. Website have grown too large, too fast, with changing underlying technologies. A–Z indexes can still be found on some websites, typically where someone connected to the website owner knows how to index and can maintain the index. On a public website, such an index is a good way for an indexer to showcase their skills.

Web content management systems ultimately provided the solution for providing findability options on large and changing websites, where search alone had failed. If implemented properly, a web content management system (such as Drupal, WordPress, Joomla!, and proprietary software such as Adobe Experience Manager, and SharePoint for intranets), can include a taxonomy or thesaurus or other controlled metadata, and when web pages are added or changed, these controlled vocabulary terms are applied (indexed) to the page or uploaded document. The onsite search engine then gives more weight to search strings that match controlled vocabulary terms and their variants to mere fulltext keyword matches. Unlike traditional website indexing, the indexing is to the page level only and not to an anchor within a page. This assigning of controlled vocabulary metadata is more akin to periodical/database indexing than it is to back-of-the-book indexing. Since an information professional would have created the taxonomy or other metadata along with instructions for indexing, those applying the terms/tags don’t need to be professional indexers. Those interested in the development of taxonomies for web content management, should see the Taxonomies & Controlled Vocabularies SIG of ASI.

See also the page: Website Index Best Practices

Indexed Documents on the Web

There is also a role for indexes in large documents posted on the web, also called “web-mounted” indexes, whether as HTML or PDF files, which are large enough to benefit from an internal index. These could be large single files, or a collection of files/pages. This includes online books, which are not the same as ebooks, as the latter include navigation features utilized in an ereader application. As web content grows in all forms, the number of large documents on the web is also increasing. There are different methods for creating indexes for such web documents, depending on whether a print version index existed previously, would be created at the same time, or would not be created at all.

Resources

Software tools:

HTML Indexer – stand-alone software for indexing websites or collections of HTML documents. (Brown Inc.)

Web resources:

Indexing the Web, American Society of Indexers

Articles:

“Website Indexing,” Mary Coe, The Indexer, vol. 34, no. 1 March 2016, pp. 20–25.
“Repurposing Print Indexing for the Web,” David Ream. 2011
“ANZSI Conference: ‘Birds of a Feather’ Session: Database and Web Indexing,” Prue Deacon and Kathy Simpson, ANZSI Newsletter, vol. 3, no. 5, 2007, p. 4.
“Web Indexing: Extending the Functionality of HTML Indexer’,” Mike Unwalla, The Indexer, vol. 25, no. 2, 2006, revised August 5, 2009.
“Book-style Indexes for Websites,” Heather Hedden, IWP — Information Wissenschaft & Praxis, 2007, no. 8, pp. 433–436.
“Changes in website indexing,” Glenda Browne, IWP — Information Wissenschaft & Praxis, 2007. no. 8, pp. 437–440.
“HTML/Prep: Transforming Indexes for the Web,” Glenda Browne, Online currents, vol. 17, no. 7, September 2002.
“Web Index Preparation with HTMLPrep,” David Ream, 2001.

Books and book chapters:

Indexing Specialties: Web Sites, Heather Hedden, American Society of Indexers, 2007.
Website Indexes: Visitors to Content in Two Clicks, James A. Lamb, Ardleigh, England: James A. Lamb, 2006.
“Web Indexes and Other Navigation Aids: Finding Information on Web Sites,” Fred Brown, in Index It Right! Advice from the Experts, volume 1, Enid L. Zafran (ed.), American Society of Indexers, 2005, pp. 121–146.
Website Indexing: Enhancing Access to Information Within Websites, 2nd edition, Glenda Browne and Jonathan Jermey, 2004. (Free download)