Air-Gapped AI: Running Local LLMs for Sensitive Evidence Processing

Cloud LLMs are powerful, convenient, and completely unacceptable for certain categories of evidence. When you're processing classified documents, attorney-client privileged material, or evidence in active litigation, sending content to an external API — regardless of the provider's security posture — creates unacceptable risk.

The alternative: run the AI locally, on infrastructure you control, with no external network connectivity.

Why Air-Gap

The case for air-gapped AI processing is straightforward:

▶Data sovereignty: Evidence never leaves your physical control
▶No third-party access: No API provider sees your data, even encrypted in transit
▶No logging concerns: No external service logging queries or responses
▶Compliance: Satisfies requirements that prohibit cloud processing of sensitive data
▶Privilege protection: Attorney-client privilege isn't waived by sharing with a machine you control

The tradeoff is capability. Local models are smaller and less capable than frontier cloud models. But for many forensic analysis tasks, they're sufficient.

Hardware Requirements

Running useful LLMs locally requires serious hardware:

Minimum viable configuration:

▶64GB RAM
▶GPU with 24GB+ VRAM (NVIDIA RTX 4090 or A5000)
▶2TB NVMe SSD (models + evidence storage)
▶No network interface card (true air gap)

TCI's standard deployment:

▶128GB RAM
▶Dual GPU setup (2x RTX 4090 or A6000)
▶RAID-10 NVMe array
▶Hardware security module (HSM) for key management
▶No wireless hardware, physically disconnected ethernet

Model Selection

Not every model is appropriate for air-gapped forensic work. TCI evaluates models on:

▶License: Must allow on-premises deployment (no API-only models)
▶Size vs. capability: 7B-70B parameter range balances capability with hardware requirements
▶Task performance: Entity extraction, summarization, and classification accuracy on forensic datasets
▶Quantization tolerance: How much quality degrades at reduced precision (important for fitting on available VRAM)

Models we deploy:

▶Llama 3 variants (70B at 4-bit quantization for analysis, 8B for lightweight tasks)
▶Mistral/Mixtral (good performance-per-parameter ratio)
▶Fine-tuned models for domain-specific entity recognition

Software Stack

The air-gapped system runs a minimal software stack:

▶OS: Hardened Linux (no unnecessary services, no package manager connected to repositories)
▶Inference: llama.cpp or vLLM for model serving
▶Orchestration: Custom Python pipeline for task management
▶Storage: LUKS-encrypted filesystem with WORM-compliant evidence storage
▶Audit: Comprehensive logging to append-only local storage

Software is installed from verified media (USB, verified checksums). Updates follow the same physical transfer process.

Workflow

Evidence Ingestion

Evidence enters the air-gapped system via:

▶Physical media (USB, external drive) with chain of custody documentation
▶One-way data diodes (hardware enforced — data in, no data out)
▶Optical media for maximum assurance

Processing

The local LLM pipeline processes evidence through:

▶Document parsing and text extraction
▶Entity extraction (names, dates, organizations, financial entities)
▶Relationship mapping between extracted entities
▶Summarization of key documents
▶Classification by relevance and sensitivity

Results Export

Processed results (not raw evidence) are exported via:

▶Physical media with integrity verification
▶Printed reports for non-digital distribution
▶Encrypted exports with key management through the HSM

The key principle: raw evidence enters but doesn't leave the air-gapped system. Only processed, reviewed outputs are exported.

Operational Challenges

Model Updates

Updating models on an air-gapped system requires physical media transfer. TCI maintains a quarterly update cycle, with critical updates applied ad hoc.

Performance Tuning

Without internet access, troubleshooting performance issues requires preloaded documentation and experienced operators.

User Training

Analysts accustomed to cloud LLMs need training on the capabilities and limitations of local models. Expectations management is critical — a local 70B model won't match GPT-4, but it handles most forensic analysis tasks adequately.

Conclusion

Air-gapped AI isn't about having the best model. It's about having a model that runs where your security requirements demand. For sensitive evidence processing, the slight reduction in capability is vastly outweighed by the guarantee of data sovereignty.

TCI provides air-gapped AI deployment as a standard offering for clients handling highly sensitive evidence.