- Lemonade SDK - Windows-only, optimized for AMD GPUs and NPUs
- Ollama - Cross-platform (Windows, macOS, Linux), supports various hardware
Why Local Inference?
Running models locally provides several key advantages:- Complete Privacy: Your data never leaves your machine
- No API Costs: Eliminate ongoing API expenses
- Low Latency: No network round-trips for inference
- Offline Capability: Work without internet connectivity
- Hardware Acceleration: Leverage your local GPU, NPU, or specialized AI processors
Lemonade SDK provides high-performance local inference on Windows, with optimizations for AMD hardware. It exposes an OpenAI-compatible API and is already configured in Morphik.The server exposes an OpenAI-compatible API at
Example configuration:
Built-in Support: Lemonade models are pre-configured in
morphik.toml
for both embeddings and completions. Simply install Lemonade Server and select the models in the UI.System Requirements
- Windows 10/11 only (x86/x64)
- 8GB+ RAM (16GB recommended)
- Python 3.10+
- Optional but recommended:
- AMD Ryzen AI 300 series (NPU acceleration)
- AMD Radeon 7000/9000 series (GPU acceleration)
Quick Start
1
Install Lemonade SDK
Command Line Installation (Recommended):Alternative: Windows GUI InstallerIf you prefer a GUI installer on Windows, download
Lemonade_Server_Installer.exe
from the Lemonade releases page.2
Start Lemonade Server
The
--ctx-size 100000
parameter is crucial for RAG applications to handle large document contexts.http://localhost:8020/api/v1
3
Configure Morphik - Two Options
Option 1: Using the UI (Recommended)
- Open Morphik UI and navigate to Settings
- Click “Add Custom Model”
- Configure as shown:

Option 2: Edit morphik.toml
Morphik comes with pre-configured Lemonade models. Check yourmorphik.toml
:When running Morphik in Docker, change
localhost
to host.docker.internal
in the api_base URLs.4
Download and Use Models
Once configured, you can:
- Select Lemonade models in the UI chat interface
- Download models as needed:
- Start using Morphik with local inference!
Supported Models
Lemonade supports a wide range of models including:- Vision Models: Qwen2.5-VL series (7B, 14B)
- Text Models: Llama, Mistral, Phi, Qwen families
- Embeddings: nomic-embed-text, BGE models
Performance Tips
- Model Quantization: Use GGUF quantized models for better performance
- Hardware Acceleration: Automatically detects and uses AMD GPUs/NPUs when available
- Memory Management: Models are cached after first download
Troubleshooting
Connection Issues
Connection Issues
- Verify Lemonade Server is running:
curl http://localhost:8020/api/v1/models
- For Docker: Use
host.docker.internal
instead oflocalhost
- Check firewall settings for port 8020
Model Loading Errors
Model Loading Errors
- Ensure sufficient disk space (5-15GB per model)
- Try smaller quantized versions (Q4, Q5)
- Check model compatibility with
lemonade list
Performance Issues
Performance Issues
- Use GGUF quantized models for better performance
- Monitor GPU/NPU usage with system tools
- Adjust batch size and context length in model config