Skip to main content

Local AI Provider

Run AI models locally with Ollama, LM Studio, LocalAI, or vLLM - complete privacy, no cloud required.

Features

  • Local AI Model Support - Run AI models entirely on your own hardware
  • Ollama Integration - Seamless integration with Ollama for easy model management
  • LM Studio Support - Connect to LM Studio's local inference server
  • LocalAI/vLLM Support - Support for LocalAI and vLLM backends
  • Privacy-First AI - Keep all data on-premises with zero cloud dependency

Requirements

RequirementDetails
DependenciesAICore
PHP Version8.2+
Local ServerOllama, LM Studio, LocalAI, or vLLM running locally
HardwareSufficient GPU/CPU for model inference

Installation

  1. Ensure AI Core module is installed and enabled
  2. Install your preferred local AI backend (Ollama recommended)
  3. Enable the Local AI Provider module in Settings > Modules
  4. Configure connection to your local AI server

Configuration

Navigate to Settings > AI Core > Providers > Local AI to configure:

  • Backend Type - Select Ollama, LM Studio, LocalAI, or vLLM
  • Server URL - Local server address (default: http://localhost:11434 for Ollama)
  • Default Model - Select default model from available local models
  • Timeout - Request timeout for local inference

Ollama Setup

# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Pull a model
ollama pull llama3.2

# Start server (usually automatic)
ollama serve

LM Studio Setup

  1. Download and install LM Studio from lmstudio.ai
  2. Load your preferred model
  3. Start the local server (default port: 1234)

Usage

Basic Local AI Request

use Modules\AICore\Services\AIService;

$aiService = app(AIService::class);

$response = $aiService->complete([
'provider' => 'local',
'model' => 'llama3.2',
'prompt' => 'Summarize this document...',
]);

Specifying Backend

$response = $aiService->complete([
'provider' => 'local',
'backend' => 'ollama',
'model' => 'mistral',
'prompt' => 'Your prompt here',
]);

Supported Backends

BackendDefault PortNotes
Ollama11434Recommended for ease of use
LM Studio1234Good for experimentation
LocalAI8080OpenAI-compatible API
vLLM8000High-performance inference

Supported Models

Llama Series (Meta)

ModelParametersContextDescription
llama3.23B128KLatest, optimized for on-device inference
llama3.2:1b1B128KSmallest, for resource-constrained devices
llama3.18B128KPowerful general capabilities
llama3.1:70b70B128KLarge model (requires significant GPU)

Mistral Series

ModelParametersContextDescription
mistral7B32KEfficient with excellent performance
mistral-nemo12B128KState-of-the-art with large context

Code-Focused Models

ModelParametersDescription
codellama7BCode generation and completion
deepseek-coder6.7BMulti-language code support
qwen2.5-coder7BStrong code understanding

Other Models

ModelParametersDescription
qwen2.57BExcellent multilingual model
phi33.8BCompact but powerful (Microsoft)
phi3.53.8BLatest Phi with improvements
gemma29BGoogle open model
gemma2:2b2BSmall and fast

Embedding Models

ModelDimensionsDescription
nomic-embed-text768Excellent local embeddings
mxbai-embed-large1024High-quality for RAG

Notes

  • Local models require significant hardware resources
  • No internet connection required after model download
  • All data stays on your servers - complete privacy
  • Performance depends on your hardware specifications
  • GPU acceleration recommended for larger models