Building an Autonomous Evidence Capture Agent with Claude Code

This is a field report on building a functional evidence capture agent using Claude Code as the agentic CLI framework. The goal: give the agent a target URL, have it capture the page and all linked resources, hash everything, timestamp it, and store the artifacts in WORM-compliant storage — all autonomously.

Architecture Overview

The agent operates in three phases:

▶Planning: Analyze the target, determine capture strategy, identify linked resources
▶Collection: Execute capture using appropriate tools, generate artifacts
▶Preservation: Hash, timestamp, and store everything with full chain of custody

Each phase produces structured output that feeds the next.

Phase 1: Planning

The agent receives a target URL and first performs reconnaissance:

▶HTTP HEAD request to determine content type and server configuration
▶DOM analysis to identify JavaScript rendering requirements
▶Link extraction to identify resources that should be captured alongside the primary target
▶Robots.txt and Terms of Service check for compliance

This planning phase is critical. A static HTML page needs different capture tools than a JavaScript SPA. The agent makes this determination autonomously.

Phase 2: Collection

Based on the plan, the agent selects and executes capture tools:

For static content:

▶wget with WARC output for the primary page and linked resources
▶curl for individual API responses or data files

For dynamic content:

▶Browsertrix for JavaScript-rendered pages
▶Screenshot capture as supplementary evidence

For all captures:

▶Full HTTP headers preserved
▶Response bodies stored with original encoding
▶Timing information recorded

The agent monitors each capture for completeness. If a resource fails to download, it retries with different parameters before moving on and logging the failure.

Phase 3: Preservation

Once collection completes, the preservation pipeline runs automatically:

▶Each artifact receives a SHA-256 hash
▶A manifest file lists all artifacts with their hashes
▶The manifest itself is hashed
▶RFC 3161 timestamps are obtained for the manifest hash
▶OpenTimestamps attestation is created as a secondary verification
▶Everything is written to S3 with Object Lock in compliance mode

The agent generates a human-readable report alongside the technical artifacts: what was captured, when, what tools were used, and what the chain of custody looks like.

Session Logging

Every action the agent takes is logged to an immutable session log:

▶Commands executed (with full arguments)
▶Tool output (truncated for large responses, with full output archived)
▶Decision points: why the agent chose one approach over another
▶Errors encountered and how they were handled
▶Timing for each operation

This session log is itself hashed and timestamped, creating an auditable record of the entire capture process.

What Worked

▶Autonomous tool selection: The agent reliably chose the right capture method for different content types
▶Error recovery: Failed downloads were retried with modified parameters without human intervention
▶Comprehensive logging: The chain of custody documentation was more thorough than what a human analyst typically produces
▶Speed: A capture that would take an analyst 30-45 minutes completed in under 5 minutes

What Needs Improvement

▶Scope creep: The agent occasionally followed links too aggressively, capturing content outside the intended scope
▶JavaScript heavy sites: Some SPAs required multiple capture attempts before the agent found the right configuration
▶Storage costs: Aggressive capture of all linked resources produces large artifact sets — smarter filtering needed

Conclusion

Agentic CLI frameworks are viable platforms for evidence capture automation. The key insight is that the "agent" isn't replacing the analyst — it's replacing the tedious execution layer while the analyst focuses on defining objectives and validating results. TCI is integrating this pattern into our standard capture infrastructure.