What is information retrieval

Literally translated, information retrieval means the recovery of information. According to the definition, information retrieval refers to a process in which, on the basis of an information request, the information that matches the information request is selected from a large unstructured database.

According to SPORTINGOLOGY, information retrieval is one of the central tasks of a search engine: Search engines are information and data collectors. The collected data is evaluated, processed, saved and retrieved.

Meaning of information retrieval

The increasing number of digitally available documents also means that a fast, targeted search is required. In the classic sense, this refers to the search for text documents. In principle, however, it must be possible to recover information for all multimedia documents.

In addition to the main application of search engines, the information retrieval process is also relevant for digital libraries, image databases or multimedia archives.

The characteristics of the search have an influence on the requirements and methods of information retrieval. This influence manifests itself, for example, as follows:

  • Database to be searched for: large differences between self-administered database and database on the Internet
  • Information request: concrete vs. rather vague idea when searching
  • Document type: Text in various formats (e.g. doc, pdf, html file), videos, images, audio files

Another problem with the selection of the appropriate information is the insecure knowledge of the information retrieval system, ie it has no knowledge of the document content. The retrieval system can only use certain methods, e.g. text statistics or term weighting, but has problems with certain word uses, e.g. with synonyms or homonyms.

In order to be able to better fulfill the information request, i.e. to be able to deliver a better result, there are various options for information retrieval to classify the search request more precisely, e.g. by taking the context of the search into account – this is exactly what search engines like Google do. The search engine includes, for example, previous inquiries.

Information retrieval models

There are different retrieval models, some of which build on one another. The most important information retrieval models include:

Boolean model

  • oldest information retrieval model based on Boolean logic from 1854
  • Contents are only found using the operators “and”, “or”, “not”
  • the content is not sorted – there is no ranking of the results

Linktopological model

  • is not based on the evaluation of the document content, but on the evaluation of the link structure between documents – this results in a ranking of the documents
  • the structure allows a statement to be made about the authority of documents
  • this includes, for example, the PageRank from Google, developed by Larry Page and Sergey Brin

Text statistics

  • Examining the terms within a document
  • Weighting is done via WDF and IDF
  • WDF: Within Document Frequencyy – relative frequency of a term within a document
  • IDF: Inverse Document Frequency – the frequency with which a document with a certain term occurs in a database
  • The vector model is also part of the text statistics model: each text corresponds to a point in space, the angles of the vectors provide information about the similarity of the words to one another

Cluster model

  • Summary of documents according to similarity
  • can speed up the search process, as only access to one document pool is required
  • Problems can arise if the clusters are incomplete or very large

How do search engines use information retrieval?

Every internet search engine uses information retrieval to process search queries. With search engines, it is important to evaluate the “ascertained” information and to sort it according to importance / relevance – this results in the ranking. As soon as you enter a search term in the search field, the search engine provides relevant information about your search term from the stored data (SERP).

With SEO, an attempt is therefore made to improve the recovery of information from the optimized page – one of the measures is, for example, the WDF * IDF optimization of websites.

What is information retrieval