This is a field report on building a functional evidence capture agent using Claude Code as the agentic CLI framework. The goal: give the agent a target URL, have it capture the page and all linked resources, hash everything, timestamp it, and store the artifacts in WORM-compliant storage — all autonomously.
Architecture Overview
The agent operates in three phases:
- ▶Planning: Analyze the target, determine capture strategy, identify linked resources
- ▶Collection: Execute capture using appropriate tools, generate artifacts
- ▶Preservation: Hash, timestamp, and store everything with full chain of custody
Each phase produces structured output that feeds the next.
Phase 1: Planning
The agent receives a target URL and first performs reconnaissance:
- ▶HTTP HEAD request to determine content type and server configuration
- ▶DOM analysis to identify JavaScript rendering requirements
- ▶Link extraction to identify resources that should be captured alongside the primary target
- ▶Robots.txt and Terms of Service check for compliance
This planning phase is critical. A static HTML page needs different capture tools than a JavaScript SPA. The agent makes this determination autonomously.
Phase 2: Collection
Based on the plan, the agent selects and executes capture tools:
For static content:
- ▶wget with WARC output for the primary page and linked resources
- ▶curl for individual API responses or data files
For dynamic content:
- ▶Browsertrix for JavaScript-rendered pages
- ▶Screenshot capture as supplementary evidence
For all captures:
- ▶Full HTTP headers preserved
- ▶Response bodies stored with original encoding
- ▶Timing information recorded
The agent monitors each capture for completeness. If a resource fails to download, it retries with different parameters before moving on and logging the failure.
Phase 3: Preservation
Once collection completes, the preservation pipeline runs automatically:
- ▶Each artifact receives a SHA-256 hash
- ▶A manifest file lists all artifacts with their hashes
- ▶The manifest itself is hashed
- ▶RFC 3161 timestamps are obtained for the manifest hash
- ▶OpenTimestamps attestation is created as a secondary verification
- ▶Everything is written to S3 with Object Lock in compliance mode
The agent generates a human-readable report alongside the technical artifacts: what was captured, when, what tools were used, and what the chain of custody looks like.
Session Logging
Every action the agent takes is logged to an immutable session log:
- ▶Commands executed (with full arguments)
- ▶Tool output (truncated for large responses, with full output archived)
- ▶Decision points: why the agent chose one approach over another
- ▶Errors encountered and how they were handled
- ▶Timing for each operation
This session log is itself hashed and timestamped, creating an auditable record of the entire capture process.
What Worked
- ▶Autonomous tool selection: The agent reliably chose the right capture method for different content types
- ▶Error recovery: Failed downloads were retried with modified parameters without human intervention
- ▶Comprehensive logging: The chain of custody documentation was more thorough than what a human analyst typically produces
- ▶Speed: A capture that would take an analyst 30-45 minutes completed in under 5 minutes
What Needs Improvement
- ▶Scope creep: The agent occasionally followed links too aggressively, capturing content outside the intended scope
- ▶JavaScript heavy sites: Some SPAs required multiple capture attempts before the agent found the right configuration
- ▶Storage costs: Aggressive capture of all linked resources produces large artifact sets — smarter filtering needed
Conclusion
Agentic CLI frameworks are viable platforms for evidence capture automation. The key insight is that the "agent" isn't replacing the analyst — it's replacing the tedious execution layer while the analyst focuses on defining objectives and validating results. TCI is integrating this pattern into our standard capture infrastructure.
