Document AI
Intelligent document processing module with OCR, classification, data extraction, and analysis capabilities powered by AI.
Features
- OCR Processing - Extract text from images, scanned documents, and PDFs
- Document Classification - Automatically categorize documents by type
- Data Extraction - Extract structured data from invoices, receipts, contracts
- Document Analysis - Summarize, analyze, and compare documents
- Multi-Language Support - Process documents in multiple languages
Requirements
| Requirement | Details |
|---|---|
| Dependencies | AICore |
| PHP Version | 8.2+ |
| AI Provider | At least one AI provider configured in AI Core |
| Recommended | DocumentManagement module for integration |
Installation
- Ensure AI Core module is installed and enabled
- Enable the Document AI module in Settings > Modules
- Configure document processing settings and storage
Configuration
Navigate to Settings > AI Core > Document AI to configure:
- OCR Engine - Select OCR provider (AI-based or Tesseract)
- Classification Categories - Define document categories
- Extraction Templates - Configure data extraction templates
- Processing Queue - Set up background processing
Usage
OCR Processing
use Modules\DocumentAI\Services\DocumentAIService;
$documentAI = app(DocumentAIService::class);
// Extract text from image or PDF
$text = $documentAI->extractText([
'file' => $uploadedFile,
'language' => 'en',
]);
Document Classification
$classification = $documentAI->classify([
'file' => $uploadedFile,
'categories' => ['invoice', 'receipt', 'contract', 'report'],
]);
// Returns: { category: 'invoice', confidence: 0.95 }
Data Extraction
// Extract invoice data
$data = $documentAI->extract([
'file' => $invoiceFile,
'template' => 'invoice',
]);
// Returns structured data:
// {
// vendor: "ABC Corp",
// invoice_number: "INV-001",
// date: "2024-01-15",
// total: 1500.00,
// line_items: [...]
// }
Document Analysis
// Summarize document
$summary = $documentAI->summarize([
'file' => $documentFile,
'max_length' => 500,
]);
// Compare documents
$comparison = $documentAI->compare([
'documents' => [$doc1, $doc2],
'highlight_differences' => true,
]);
API Endpoints
| Method | Endpoint | Description |
|---|---|---|
| POST | /api/v1/document-ai/ocr | Extract text from document |
| POST | /api/v1/document-ai/classify | Classify document |
| POST | /api/v1/document-ai/extract | Extract structured data |
| POST | /api/v1/document-ai/summarize | Summarize document |
Extraction Templates
| Template | Extracts |
|---|---|
| Invoice | Vendor, invoice number, date, amounts, line items |
| Receipt | Merchant, date, total, items |
| Contract | Parties, dates, terms, signatures |
| Resume | Name, contact, experience, skills |
| ID Card | Name, ID number, date of birth, address |
Supported Formats
- Images: JPG, PNG, TIFF, BMP
- Documents: PDF (scanned and native)
- Office: DOCX, XLSX (with conversion)
Notes
- Large documents are processed in background queues
- Extraction accuracy depends on document quality
- Custom templates can be created for organization-specific documents
- All processed documents are stored securely with encryption