Skip to main content

Document AI

Intelligent document processing module with OCR, classification, data extraction, and analysis capabilities powered by AI.

Features

  • OCR Processing - Extract text from images, scanned documents, and PDFs
  • Document Classification - Automatically categorize documents by type
  • Data Extraction - Extract structured data from invoices, receipts, contracts
  • Document Analysis - Summarize, analyze, and compare documents
  • Multi-Language Support - Process documents in multiple languages

Requirements

RequirementDetails
DependenciesAICore
PHP Version8.2+
AI ProviderAt least one AI provider configured in AI Core
RecommendedDocumentManagement module for integration

Installation

  1. Ensure AI Core module is installed and enabled
  2. Enable the Document AI module in Settings > Modules
  3. Configure document processing settings and storage

Configuration

Navigate to Settings > AI Core > Document AI to configure:

  • OCR Engine - Select OCR provider (AI-based or Tesseract)
  • Classification Categories - Define document categories
  • Extraction Templates - Configure data extraction templates
  • Processing Queue - Set up background processing

Usage

OCR Processing

use Modules\DocumentAI\Services\DocumentAIService;

$documentAI = app(DocumentAIService::class);

// Extract text from image or PDF
$text = $documentAI->extractText([
'file' => $uploadedFile,
'language' => 'en',
]);

Document Classification

$classification = $documentAI->classify([
'file' => $uploadedFile,
'categories' => ['invoice', 'receipt', 'contract', 'report'],
]);

// Returns: { category: 'invoice', confidence: 0.95 }

Data Extraction

// Extract invoice data
$data = $documentAI->extract([
'file' => $invoiceFile,
'template' => 'invoice',
]);

// Returns structured data:
// {
// vendor: "ABC Corp",
// invoice_number: "INV-001",
// date: "2024-01-15",
// total: 1500.00,
// line_items: [...]
// }

Document Analysis

// Summarize document
$summary = $documentAI->summarize([
'file' => $documentFile,
'max_length' => 500,
]);

// Compare documents
$comparison = $documentAI->compare([
'documents' => [$doc1, $doc2],
'highlight_differences' => true,
]);

API Endpoints

MethodEndpointDescription
POST/api/v1/document-ai/ocrExtract text from document
POST/api/v1/document-ai/classifyClassify document
POST/api/v1/document-ai/extractExtract structured data
POST/api/v1/document-ai/summarizeSummarize document

Extraction Templates

TemplateExtracts
InvoiceVendor, invoice number, date, amounts, line items
ReceiptMerchant, date, total, items
ContractParties, dates, terms, signatures
ResumeName, contact, experience, skills
ID CardName, ID number, date of birth, address

Supported Formats

  • Images: JPG, PNG, TIFF, BMP
  • Documents: PDF (scanned and native)
  • Office: DOCX, XLSX (with conversion)

Notes

  • Large documents are processed in background queues
  • Extraction accuracy depends on document quality
  • Custom templates can be created for organization-specific documents
  • All processed documents are stored securely with encryption