ml-colabs

BM25sRetriever.retrieve:v5
Name
Version
Last updated
12 months ago
Calls:
import weave
import bm25s
from Stemmer import Stemmer
LANGUAGE_DICT = {
    "english": "en",
    "french": "fr",
    "german": "de"
}
@weave.op()
def retrieve(self, query: str, top_k: int = 2):
    """
    Retrieves the top-k most relevant chunks for a given query using the BM25 algorithm.
    This method tokenizes the input query using the BM25 tokenizer, which takes into
    account the language-specific stopwords and optional stemming. It then retrieves
    the top-k most relevant chunks from the BM25 index based on the tokenized query.
    The results are returned as a list of dictionaries, each containing a chunk and
    its corresponding relevance score.
    Args:
        query (str): The input query string to search for relevant chunks.
        top_k (int, optional): The number of top relevant chunks to retrieve. Defaults to 2.