Types of indexing in information retrieval
Answers
Indexing proceeds at four stages namely content specification, tokenization of documents, processing of document terms, and index building. The index can be stored in the form of different data structures namely direct index, document index, lexicon and inverted index.
Abstract:
Indexing is an important process in Information Retrieval (IR) systems. It forms the core functionality of the IR process since it is the first step in IR and assists in efficient information retrieval. Indexing reduces the documents to the informative terms contained in them. It provides a mapping from the terms to the respective documents containing them. Once effective index has been built for the collection of documents, retrieval process is simplified. Indexing proceeds at four stages namely content specification, tokenization of documents, processing of document terms, and index building. The index can be stored in the form of different data structures namely direct index, document index, lexicon and inverted index. Index can be built by applying different algorithms or schemes such as single-pass in-memory indexing, blocked-indexing, etc. This paper explains the indexing process with the various data structures and algorithms used for indexing and finally analyses the different indexing approaches with respect to time, memory usage and mean average precision.