Configure Morphik
Get the most out of Morphik by customizing it to your needs
Morphik Configuration Guide
Morphik’s behavior can be fully customized through the morphik.toml
configuration file. This file is the central place to configure all aspects of the system, from API settings to model providers, database connections, and processing options.
Configuration Basics
The morphik.toml
file uses the TOML format (Tom’s Obvious Minimal Language) to organize settings into logical sections. Each section controls a specific component of the Morphik system:
When you first set up Morphik, you can:
- Run
python quick_setup.py
for an interactive setup process - Copy and modify the example
morphik.toml
file - Create your own configuration file manually
Registered Models Approach
Morphik uses a registered models approach which allows you to define hundreds of different models and reference them throughout your configuration. This makes it easy to:
- Use different models for different tasks
- Mix and match models based on your needs (e.g. smaller models for simpler tasks)
- Configure model-specific settings in one place
- Switch between providers without changing the rest of your configuration
Defining Registered Models
First, register your models in the [registered_models]
section:
Using Registered Models
Then reference these models by their key in different sections:
This approach gives you maximum flexibility to use different models for different components, and even switch between local and cloud models as needed.
Core Configuration Sections
API Server Settings
Controls how the API server runs:
Authentication
Controls user authentication and permissions:
LLM Completion Settings
Configure the model used for generating text:
Database Connection
Choose your document metadata database:
Embedding Configuration
Configure vector embeddings generation:
Document Parsing
Control how documents are processed and chunked:
Vision Processing
Configure multimodal processing for images and videos:
Reranking
Configure the cross-encoder reranker for improved retrieval:
Document Storage
Choose where to store your document files:
Vector Database
Configure where vector embeddings are stored:
Rules Engine
Configure document processing with rules:
Knowledge Graph
Configure entity extraction and graph building:
General Morphik Settings
Control core platform features:
Telemetry
Configure observability settings:
Environment Variables
Morphik uses environment variables for sensitive credentials and configuration that shouldn’t be stored in the configuration file:
OPENAI_API_KEY
: Your OpenAI API keyANTHROPIC_API_KEY
: Your Anthropic API keyJWT_SECRET_KEY
: Secret for authentication tokensPOSTGRES_URI
: PostgreSQL connection stringMONGODB_URI
: MongoDB connection stringAWS_ACCESS_KEY
andAWS_SECRET_ACCESS_KEY
: For S3 storageASSEMBLYAI_API_KEY
: For video transcriptionUNSTRUCTURED_API_KEY
: For enhanced document parsing (optional)HONEYCOMB_API_KEY
: For telemetry (optional)
Complete Configuration Example
Here’s a complete example showing the structure of a morphik.toml file with registered models:
Mixing and Matching Models
With the registered models approach, you can easily optimize your configuration:
- Use powerful models like Claude Opus for knowledge graph generation
- Use smaller, faster models for simple parsing tasks
- Use OpenAI models for some tasks and Anthropic models for others
- Use Ollama for local development and cloud models for production
For example:
Docker Configuration Adjustments
When using Docker, make these changes:
- Set
host = "0.0.0.0"
in the[api]
section - For Ollama integration, use models with api_base set to
"http://ollama:11434"
- Use container names in your database connection strings
For more details on Docker deployment, check out the DOCKER.md
file in the repository.
Need Help?
If you’re having trouble with your configuration:
- Check the logs for error messages
- Run
python quick_setup.py
for guided setup - Join our Discord community
- Open an issue on our GitHub repository
Was this page helpful?