Local AI Provider
Run AI models locally with Ollama, LM Studio, LocalAI, or vLLM - complete privacy, no cloud required.
Features
- Local AI Model Support - Run AI models entirely on your own hardware
- Ollama Integration - Seamless integration with Ollama for easy model management
- LM Studio Support - Connect to LM Studio's local inference server
- LocalAI/vLLM Support - Support for LocalAI and vLLM backends
- Privacy-First AI - Keep all data on-premises with zero cloud dependency
Requirements
| Requirement | Details |
|---|---|
| Dependencies | AICore |
| PHP Version | 8.2+ |
| Local Server | Ollama, LM Studio, LocalAI, or vLLM running locally |
| Hardware | Sufficient GPU/CPU for model inference |
Installation
- Ensure AI Core module is installed and enabled
- Install your preferred local AI backend (Ollama recommended)
- Enable the Local AI Provider module in Settings > Modules
- Configure connection to your local AI server
Configuration
Navigate to Settings > AI Core > Providers > Local AI to configure:
- Backend Type - Select Ollama, LM Studio, LocalAI, or vLLM
- Server URL - Local server address (default: http://localhost:11434 for Ollama)
- Default Model - Select default model from available local models
- Timeout - Request timeout for local inference
Ollama Setup
# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
# Pull a model
ollama pull llama3.2
# Start server (usually automatic)
ollama serve
LM Studio Setup
- Download and install LM Studio from lmstudio.ai
- Load your preferred model
- Start the local server (default port: 1234)
Usage
Basic Local AI Request
use Modules\AICore\Services\AIService;
$aiService = app(AIService::class);
$response = $aiService->complete([
'provider' => 'local',
'model' => 'llama3.2',
'prompt' => 'Summarize this document...',
]);
Specifying Backend
$response = $aiService->complete([
'provider' => 'local',
'backend' => 'ollama',
'model' => 'mistral',
'prompt' => 'Your prompt here',
]);
Supported Backends
| Backend | Default Port | Notes |
|---|---|---|
| Ollama | 11434 | Recommended for ease of use |
| LM Studio | 1234 | Good for experimentation |
| LocalAI | 8080 | OpenAI-compatible API |
| vLLM | 8000 | High-performance inference |
Supported Models
Llama Series (Meta)
| Model | Parameters | Context | Description |
|---|---|---|---|
llama3.2 | 3B | 128K | Latest, optimized for on-device inference |
llama3.2:1b | 1B | 128K | Smallest, for resource-constrained devices |
llama3.1 | 8B | 128K | Powerful general capabilities |
llama3.1:70b | 70B | 128K | Large model (requires significant GPU) |
Mistral Series
| Model | Parameters | Context | Description |
|---|---|---|---|
mistral | 7B | 32K | Efficient with excellent performance |
mistral-nemo | 12B | 128K | State-of-the-art with large context |
Code-Focused Models
| Model | Parameters | Description |
|---|---|---|
codellama | 7B | Code generation and completion |
deepseek-coder | 6.7B | Multi-language code support |
qwen2.5-coder | 7B | Strong code understanding |
Other Models
| Model | Parameters | Description |
|---|---|---|
qwen2.5 | 7B | Excellent multilingual model |
phi3 | 3.8B | Compact but powerful (Microsoft) |
phi3.5 | 3.8B | Latest Phi with improvements |
gemma2 | 9B | Google open model |
gemma2:2b | 2B | Small and fast |
Embedding Models
| Model | Dimensions | Description |
|---|---|---|
nomic-embed-text | 768 | Excellent local embeddings |
mxbai-embed-large | 1024 | High-quality for RAG |
Notes
- Local models require significant hardware resources
- No internet connection required after model download
- All data stays on your servers - complete privacy
- Performance depends on your hardware specifications
- GPU acceleration recommended for larger models