Late interaction allow for semantically rich interactions that enable a precise retrieval process across different modalities of unstructured data, including text and images.
In this context, “interaction” refers to the process of assessing how well a document matches a given search query by comparing their representations.
A dense retrieval model is a model that uses some type of neural network architecture to retrieve relevant documents for a search query.
Traditional methods for retrieval commonly use “no-interaction” retrieval models. In this case, the search query and documents are processed separately
Advantages of no-interaction retrieval models are primarily that they are fast and computationally efficient
These characteristics make full interaction models great for second-stage retrieval, like reranking a curated set of candidate documents
extremely computationally expensive
scalable and contextually rich
storage requirements - they require an embedding for each token, which requires a lot more storage for a complete set of vectors
Disadvantages of no-interaction retrieval models lie in the lack of interaction between the search query and the documents.
multimodal late interaction retrieval models
vision language models (VLMs) instead of text-only models