Self-Hosted Agent Infrastructure: Architecture for Isolated Operations

The previous article covered running local LLMs in air-gapped environments. This article addresses a harder problem: running autonomous AI agents — not just models, but agents that plan, execute, and iterate — on isolated infrastructure.

Agent vs. Model

A model responds to prompts. An agent uses models to pursue objectives. The distinction matters for infrastructure architecture because agents:

▶Maintain state across interactions
▶Execute tools and system commands
▶Make autonomous decisions about next steps
▶Require access to external resources (databases, file systems, analysis tools)
▶Operate over extended time periods

Running a model in isolation is a solved problem. Running an agent in isolation introduces new architectural challenges.

Architecture Overview

TCI's self-hosted agent infrastructure uses a layered architecture:

Layer 1: Physical Infrastructure

▶Dedicated hardware with no wireless capabilities
▶Network isolation (air gap or strict firewall with allowlisted destinations only)
▶Hardware security module for cryptographic operations
▶Uninterruptible power supply with monitoring
▶Physical access controls (locked server room, access logging)

Layer 2: Operating Environment

▶Hardened Linux base (CIS Benchmark Level 2)
▶Container runtime (Podman — rootless, daemonless)
▶Local model serving (vLLM or llama.cpp)
▶File system encryption (LUKS)
▶Append-only audit logs

Layer 3: Agent Framework

▶Agent orchestration layer (custom or adapted from open source frameworks)
▶Tool registry: curated set of tools the agent can invoke
▶Permission system: per-tool and per-resource access controls
▶State management: persistent agent memory stored encrypted
▶Session management: isolated sessions per investigation

Layer 4: Evidence Infrastructure

▶WORM-compliant artifact storage
▶Hashing and timestamping pipeline
▶Chain of custody logging
▶Evidence ingestion and export interfaces

Key Design Decisions

Agent Autonomy Boundaries

On cloud infrastructure, an agent that makes a mistake affects your cloud resources. On isolated infrastructure handling sensitive evidence, a mistake can compromise an investigation.

TCI implements three autonomy levels:

▶Full autonomy: Agent can execute without approval (reading files, running analysis, querying databases)
▶Approval required: Agent proposes an action and waits for human approval (modifying evidence stores, exporting data, executing destructive operations)
▶Prohibited: Actions the agent cannot take regardless of approval (network access on air-gapped systems, deleting evidence, modifying audit logs)

Tool Isolation

Each tool the agent invokes runs in its own container with:

▶No network access (inheriting the host's isolation)
▶Read-only access to evidence (write access only to designated output directories)
▶Resource limits (CPU, memory, disk I/O)
▶Execution timeout

If a tool behaves unexpectedly, the damage is contained.

State Management

Agent state (conversation history, working memory, investigation context) is:

▶Encrypted at rest using keys managed by the HSM
▶Versioned, so state can be rolled back if the agent goes off track
▶Auditable, so investigators can review the agent's reasoning
▶Segregated by investigation (no cross-contamination between cases)

Audit Trail

Every agent action generates an audit record:

▶Timestamp (NTP-synchronized within the isolated network, or GPS-based if fully air-gapped)
▶Action type and parameters
▶Tool invoked and tool output
▶Agent reasoning (the model's chain of thought, if available)
▶Resource access log

Audit records are written to WORM storage. They form part of the chain of custody for any evidence the agent processes.

Deployment Models

Model A: Fully Air-Gapped

No network connectivity whatsoever. Evidence enters and results exit via physical media. The agent operates entirely within the isolated system.

Best for: classified evidence, maximum security requirements.

Model B: Data Diode

One-way network connectivity allows evidence ingestion from external sources, but no data can leave the system via the network. Results are exported via physical media or separate approved channels.

Best for: real-time evidence capture with secure processing.

Model C: Strict Firewall

Network connectivity exists but is restricted to specific, allowlisted destinations (e.g., time servers, certificate authorities). All traffic is logged and inspected.

Best for: operations requiring some external connectivity (timestamp servers, update repositories) while maintaining isolation from general internet.

Performance Considerations

Self-hosted agents are slower than cloud-based agents. Local models have lower throughput, and the isolated environment can't leverage distributed computing. TCI addresses this through:

▶Efficient model selection (choosing the smallest model that meets accuracy requirements)
▶Batch processing (queuing tasks for efficient GPU utilization)
▶Preprocessing optimization (extracting text, normalizing formats before model processing)
▶Caching (storing model responses for repeated patterns to avoid redundant inference)

Conclusion

Self-hosted agent infrastructure is harder to build, harder to maintain, and less capable than cloud alternatives. But for organizations handling evidence where data sovereignty is non-negotiable, it's the only option that satisfies both security requirements and the productivity gains that AI agents provide.

TCI's architecture balances these constraints through layered isolation, granular permission controls, and comprehensive audit logging.