retrieve_docs

Retrieve relevant documents from DataBridge.

def retrieve_docs(
    query: str,
    filters: Optional[Dict[str, Any]] = None,
    k: int = 4,
    min_score: float = 0.0,
    use_colpali: bool = True,
) -> List[DocumentResult]

Parameters

  • query (str): Search query text
  • filters (Dict[str, Any], optional): Optional metadata filters
  • k (int, optional): Number of results. Defaults to 4.
  • min_score (float, optional): Minimum similarity threshold. Defaults to 0.0.
  • use_colpali (bool, optional): Whether to use ColPali-style embedding model to retrieve the documents (only works for documents ingested with use_colpali=True). Defaults to True.

Returns

  • List[DocumentResult]: List of document results

Example

from databridge.sync import DataBridge

db = DataBridge()

docs = db.retrieve_docs(
    "machine learning",
    k=5,
    min_score=0.5
)

for doc in docs:
    print(f"Score: {doc.score}")
    print(f"Document ID: {doc.document_id}")
    print(f"Metadata: {doc.metadata}")
    print(f"Content: {doc.content}")
    print("---")

DocumentResult Properties

The DocumentResult objects returned by this method have the following properties:

  • score (float): Relevance score
  • document_id (str): Document ID
  • metadata (Dict[str, Any]): Document metadata
  • content (DocumentContent): Document content or URL