All Insights
Implementationagentic clis9 min

Building an Autonomous Evidence Capture Agent with Claude Code

David Greenhill
David GreenhillTechnical Lead
·

This is a field report on building a functional evidence capture agent using Claude Code as the agentic CLI framework. The goal: give the agent a target URL, have it capture the page and all linked resources, hash everything, timestamp it, and store the artifacts in WORM-compliant storage — all autonomously.

Architecture Overview

The agent operates in three phases:

  • Planning: Analyze the target, determine capture strategy, identify linked resources
  • Collection: Execute capture using appropriate tools, generate artifacts
  • Preservation: Hash, timestamp, and store everything with full chain of custody

Each phase produces structured output that feeds the next.

Phase 1: Planning

The agent receives a target URL and first performs reconnaissance:

  • HTTP HEAD request to determine content type and server configuration
  • DOM analysis to identify JavaScript rendering requirements
  • Link extraction to identify resources that should be captured alongside the primary target
  • Robots.txt and Terms of Service check for compliance

This planning phase is critical. A static HTML page needs different capture tools than a JavaScript SPA. The agent makes this determination autonomously.

Phase 2: Collection

Based on the plan, the agent selects and executes capture tools:

For static content:

  • wget with WARC output for the primary page and linked resources
  • curl for individual API responses or data files

For dynamic content:

  • Browsertrix for JavaScript-rendered pages
  • Screenshot capture as supplementary evidence

For all captures:

  • Full HTTP headers preserved
  • Response bodies stored with original encoding
  • Timing information recorded

The agent monitors each capture for completeness. If a resource fails to download, it retries with different parameters before moving on and logging the failure.

Phase 3: Preservation

Once collection completes, the preservation pipeline runs automatically:

  • Each artifact receives a SHA-256 hash
  • A manifest file lists all artifacts with their hashes
  • The manifest itself is hashed
  • RFC 3161 timestamps are obtained for the manifest hash
  • OpenTimestamps attestation is created as a secondary verification
  • Everything is written to S3 with Object Lock in compliance mode

The agent generates a human-readable report alongside the technical artifacts: what was captured, when, what tools were used, and what the chain of custody looks like.

Session Logging

Every action the agent takes is logged to an immutable session log:

  • Commands executed (with full arguments)
  • Tool output (truncated for large responses, with full output archived)
  • Decision points: why the agent chose one approach over another
  • Errors encountered and how they were handled
  • Timing for each operation

This session log is itself hashed and timestamped, creating an auditable record of the entire capture process.

What Worked

  • Autonomous tool selection: The agent reliably chose the right capture method for different content types
  • Error recovery: Failed downloads were retried with modified parameters without human intervention
  • Comprehensive logging: The chain of custody documentation was more thorough than what a human analyst typically produces
  • Speed: A capture that would take an analyst 30-45 minutes completed in under 5 minutes

What Needs Improvement

  • Scope creep: The agent occasionally followed links too aggressively, capturing content outside the intended scope
  • JavaScript heavy sites: Some SPAs required multiple capture attempts before the agent found the right configuration
  • Storage costs: Aggressive capture of all linked resources produces large artifact sets — smarter filtering needed

Conclusion

Agentic CLI frameworks are viable platforms for evidence capture automation. The key insight is that the "agent" isn't replacing the analyst — it's replacing the tedious execution layer while the analyst focuses on defining objectives and validating results. TCI is integrating this pattern into our standard capture infrastructure.

David Greenhill

Written by

David Greenhill

Technical Lead, The Commonlight Initiative

Need help with your evidence infrastructure?

TCI builds capture pipelines, redaction workflows, and air-gapped processing systems for organizations handling sensitive data.