API Reference

This page provides detailed documentation for all classes and methods in the DataBridge Python SDK.

DataBridge

The main client class for synchronous operations.

DataBridge(uri: Optional[str] = None, timeout: int = 30, is_local: bool = False)

Parameters

uri: Optional connection URI in format databridge://owner_id:token@host
timeout: Request timeout in seconds (default: 30)
is_local: Whether connecting to a local development server (default: False)

Methods

`ingest_text`

ingest_text(
    content: str,
    metadata: Optional[Dict[str, Any]] = None,
    rules: Optional[List[RuleOrDict]] = None,
    use_colpali: bool = True
) -> Document

Ingest a text document into DataBridge.

Parameters:

content: Text content to ingest
metadata: Optional document metadata dictionary
rules: Optional list of rules to apply during ingestion
use_colpali: Whether to use ColPali-style embedding model (default: True)

Returns: A Document object representing the ingested document

`ingest_file`

ingest_file(
    file: Union[str, bytes, BinaryIO, Path],
    filename: Optional[str] = None,
    metadata: Optional[Dict[str, Any]] = None,
    rules: Optional[List[RuleOrDict]] = None,
    use_colpali: bool = True
) -> Document

Ingest a file document into DataBridge.

Parameters:

file: File to ingest (path string, bytes, file object, or Path)
filename: Name of the file (required if file is bytes or file object)
metadata: Optional document metadata dictionary
rules: Optional list of rules to apply during ingestion
use_colpali: Whether to use ColPali-style embedding model (default: True)

Returns: A Document object representing the ingested document

`retrieve_chunks`

retrieve_chunks(
    query: str,
    filters: Optional[Dict[str, Any]] = None,
    k: int = 4,
    min_score: float = 0.0,
    use_colpali: bool = True
) -> List[FinalChunkResult]

Retrieve relevant chunks based on a query.

Parameters:

query: Search query text
filters: Optional metadata filters dictionary
k: Number of results to return (default: 4)
min_score: Minimum similarity threshold (default: 0.0)
use_colpali: Whether to use ColPali-style embedding model (default: True)

Returns: List of FinalChunkResult objects

`retrieve_docs`

retrieve_docs(
    query: str,
    filters: Optional[Dict[str, Any]] = None,
    k: int = 4,
    min_score: float = 0.0,
    use_colpali: bool = True
) -> List[DocumentResult]

Retrieve relevant documents based on a query.

Parameters:

query: Search query text
filters: Optional metadata filters dictionary
k: Number of results to return (default: 4)
min_score: Minimum similarity threshold (default: 0.0)
use_colpali: Whether to use ColPali-style embedding model (default: True)

Returns: List of DocumentResult objects

`query`

query(
    query: str,
    filters: Optional[Dict[str, Any]] = None,
    k: int = 4,
    min_score: float = 0.0,
    max_tokens: Optional[int] = None,
    temperature: Optional[float] = None,
    use_colpali: bool = True
) -> CompletionResponse

Generate a completion using relevant chunks as context.

Parameters:

query: Query text
filters: Optional metadata filters dictionary
k: Number of chunks to use as context (default: 4)
min_score: Minimum similarity threshold (default: 0.0)
max_tokens: Maximum tokens in completion (optional)
temperature: Model temperature (optional)
use_colpali: Whether to use ColPali-style embedding model (default: True)

Returns: A CompletionResponse object

`list_documents`

list_documents(
    skip: int = 0,
    limit: int = 100,
    filters: Optional[Dict[str, Any]] = None
) -> List[Document]

List accessible documents.

Parameters:

skip: Number of documents to skip (default: 0)
limit: Maximum number of documents to return (default: 100)
filters: Optional filters dictionary

Returns: List of Document objects

`get_document`

get_document(document_id: str) -> Document

Get document metadata by ID.

Parameters:

document_id: ID of the document

Returns: A Document object

`create_cache`

create_cache(
    name: str,
    model: str,
    gguf_file: str,
    filters: Optional[Dict[str, Any]] = None,
    docs: Optional[List[str]] = None
) -> Dict[str, Any]

Create a new cache with specified configuration.

Parameters:

name: Name of the cache to create
model: Name of the model to use (e.g., “llama2”)
gguf_file: Name of the GGUF file to use
filters: Optional metadata filters for document selection
docs: Optional list of specific document IDs to include

Returns: Dictionary with cache configuration

`get_cache`

get_cache(name: str) -> Cache

Get a cache by name.

Parameters:

name: Name of the cache to retrieve

Returns: A Cache object

AsyncDataBridge

The main client class for asynchronous operations. Has the same methods as DataBridge but with async/await support.

AsyncDataBridge(uri: Optional[str] = None, timeout: int = 30, is_local: bool = False)

Cache

Class for interacting with a cache.

Cache(db: DataBridge, name: str)

Methods

`update`

update() -> bool

Update the cache with the latest documents.

Returns: Boolean indicating success

`add_docs`

add_docs(docs: List[str]) -> bool

Add specific documents to the cache.

Parameters:

docs: List of document IDs to add to the cache

Returns: Boolean indicating success

`query`

query(
    query: str,
    max_tokens: Optional[int] = None,
    temperature: Optional[float] = None
) -> CompletionResponse

Query the cache.

Parameters:

query: Query text
max_tokens: Maximum tokens in completion (optional)
temperature: Model temperature (optional)

Returns: A CompletionResponse object

AsyncCache

Asynchronous version of Cache. Has the same methods but with async/await support.

Rule Classes

MetadataExtractionRule

MetadataExtractionRule(schema: Union[Type[BaseModel], Dict[str, Any]])

Rule for extracting metadata using a schema.

Parameters:

schema: Pydantic model class or dictionary schema

NaturalLanguageRule

NaturalLanguageRule(prompt: str)

Rule for transforming content using natural language.

Parameters:

prompt: Instruction for how to transform the content

Data Models

Document

Document(
    external_id: str,
    content_type: str,
    filename: Optional[str] = None,
    metadata: Dict[str, Any] = {},
    storage_info: Dict[str, str] = {},
    system_metadata: Dict[str, Any] = {},
    access_control: Dict[str, Any] = {},
    chunk_ids: List[str] = []
)

Represents a document in DataBridge.

ChunkResult

ChunkResult(
    content: str,
    score: float,
    document_id: str,
    chunk_number: int,
    metadata: Dict[str, Any] = {},
    content_type: str,
    filename: Optional[str] = None,
    download_url: Optional[str] = None
)

Represents a chunk result from a query.

FinalChunkResult

FinalChunkResult(
    content: str | PILImage,
    score: float,
    document_id: str,
    chunk_number: int,
    metadata: Dict[str, Any] = {},
    content_type: str,
    filename: Optional[str] = None,
    download_url: Optional[str] = None
)

Represents a processed chunk result from a query, with support for images.

DocumentResult

DocumentResult(
    score: float,
    document_id: str,
    metadata: Dict[str, Any] = {},
    content: DocumentContent
)

Represents a document result from a query.

CompletionResponse

CompletionResponse(
    completion: str,
    usage: Dict[str, int]
)

Represents a completion response from a query.

Exceptions

DataBridgeError

Base exception for all DataBridge SDK errors.

AuthenticationError

Exception raised for authentication-related issues.

ConnectionError

Exception raised for connection-related issues.

Getting Started

Guides

Reference

API Reference

API Reference

DataBridge

Parameters

Methods

`ingest_text`

`ingest_file`

`retrieve_chunks`

`retrieve_docs`

`query`

`list_documents`

`get_document`

`create_cache`

`get_cache`

AsyncDataBridge

Cache

Methods

`update`

`add_docs`

`query`

AsyncCache

Rule Classes

MetadataExtractionRule

NaturalLanguageRule

Data Models

Document

ChunkResult

FinalChunkResult

DocumentResult

CompletionResponse

Exceptions

DataBridgeError

AuthenticationError

ConnectionError

Getting Started

Guides

Reference

​API Reference

​DataBridge

​Parameters

​Methods

​ingest_text

​ingest_file

​retrieve_chunks

​retrieve_docs

​query

​list_documents

​get_document

​create_cache

​get_cache

​AsyncDataBridge

​Cache

​Methods

​update

​add_docs

​query

​AsyncCache

​Rule Classes

​MetadataExtractionRule

​NaturalLanguageRule

​Data Models

​Document

​ChunkResult

​FinalChunkResult

​DocumentResult

​CompletionResponse

​Exceptions

​DataBridgeError

​AuthenticationError

​ConnectionError

API Reference

DataBridge

Parameters

Methods

`ingest_text`

`ingest_file`

`retrieve_chunks`

`retrieve_docs`

`query`

`list_documents`

`get_document`

`create_cache`

`get_cache`

AsyncDataBridge

Cache

Methods

`update`

`add_docs`

`query`

AsyncCache

Rule Classes

MetadataExtractionRule

NaturalLanguageRule

Data Models

Document

ChunkResult

FinalChunkResult

DocumentResult

CompletionResponse

Exceptions

DataBridgeError

AuthenticationError

ConnectionError