BM25sRetriever.retrieve:v5
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import weave
import bm25s
from Stemmer import Stemmer
LANGUAGE_DICT = {
"english": "en",
"french": "fr",
"german": "de"
}
@weave.op()
def retrieve(self, query: str, top_k: int = 2):
"""
Retrieves the top-k most relevant chunks for a given query using the BM25 algorithm.
This method tokenizes the input query using the BM25 tokenizer, which takes into
account the language-specific stopwords and optional stemming. It then retrieves
the top-k most relevant chunks from the BM25 index based on the tokenized query.
The results are returned as a list of dictionaries, each containing a chunk and
its corresponding relevance score.
Args:
query (str): The input query string to search for relevant chunks.
top_k (int, optional): The number of top relevant chunks to retrieve. Defaults to 2.