JM Smoothing Language Model For Ranking
Posted by TRII in text-retrieval-and-search-engines
Introduction / Overview¶
We continue our work on the Text Retrieval and Search Engines course (see here for the last article). For the various topics covered in the course, the goal is to implement some of the methods and tools in order to gain some hands-on experience.
The previous articles looked at embedding documents and queries into an $n$-dimensional space, calculating the distances between query-document embeddings, and utilizing these distances as a measure of similarity between documents and queries.
Virginia Disc One Exploration
Posted by TRII in text-retrieval-and-search-engines
Introduction / Overview¶
Virginia Disc One was "the first large-scale distribution of test collections" used in Information Retrieval. The goal was to create a large test collection that hundreds of researchers could contribute to and utilize for work in the IR field. While many larger, more comprehensive collections have been created and distributed since VD1 was first distributed in 1990, we thought it would be interesting (and fun!) to take a look at some of the contents and use them for future notebooks / articles.
Improved VSM Instantiation - Okapi BM25
Posted by TRII in text-retrieval-and-search-engines
Introduction / Overview¶
This is the second notebook related to work from the Coursera course, Text Retrieval and Search Engines. (See here for the first.) For the various topics covered in the course, the goal is to implement some of the methods and tools in order to gain some hands-on experience.
Simplest VSM Instantiation
Posted by TRII in text-retrieval-and-search-engines
Introduction¶
This notebook will (hopefully) be the first in a series of notebooks related to work from the Coursera course, Text Retrieval and Search Engines. For the various topics covered in the course, the goal is to impliment some of the methods and tools in order to gain some hands-on experience.