AI Agent InfrastructureAI Note
Agent Infrastructure

AI Agent Adoption Starts with
Evaluation and Context Design,
Not Tool Selection

AI agent news is often discussed through tool names and framework names. In practical adoption, however, results are shaped by what the agent can read, what counts as success, and which failures are unacceptable.

Adding tools does not make an agent learn the job

GitHub is full of OSS for code understanding, long-running agents, multi-persona agents, context integration, and prompt improvement. Each can be useful, but installing one does not automatically improve business quality.

Tool First

Start from tools

  • Try OSS first
  • Throw work at the agent
  • Evaluate by feeling
  • Fix only the prompt
System First

Start from evaluation

  • Define input and output
  • Prepare context
  • Write pass conditions
  • Return logs to operation

Delegating work to an agent does not mean giving AI freedom. It means designing context, authority, and evaluation.

Agent accuracy is often decided more by accessible context than by the model itself

The move toward connecting multiple data sources shows the practical challenge of agent adoption. If internal databases, APIs, documents, GitHub, Notion, and CRM remain separated, AI tends to fall back to generalities.

01
Let it read static rulesPlace judgment premises such as CLAUDE.md, AGENTS.md, design policies, and operating rules.
02
Let it read current stateConnect the issue, PR, customer information, inventory, metrics, and latest documents needed for the current decision.
03
Let it read past judgmentsKeep why a design was chosen, what failed before, and which criteria were prioritized.

The bottleneck in agent development becomes evaluation, not generation

AI can produce outputs quickly. That makes the human-side burden shift toward judging whether the output is correct, usable, and safe.

What To Evaluate

Things to evaluate

  • Factual accuracy
  • Scope adherence
  • Consistency with existing rules
  • Safe use of authority
  • Completeness as a deliverable
How To Evaluate

Where to place evaluation

  • Make checklists explicit
  • Turn failures into test cases
  • Do not leave human review only at the end
  • Separate automatic and manual evaluation
  • Return evaluation results to the next context

Multiple agents are not about increasing headcount; they are about separating roles and responsibility

The idea of multi-persona agents or agent teams resembles human organization design. But increasing roles alone creates confusion.

Role 1
ExplorerFind needed facts from code, documents, and data. Its responsibility is confirmation rather than judgment.
Role 2
WorkerImplement within a defined scope. Its responsibility is the request scope and existing rules, not free expansion.
Role 3
ReviewerBefore the result is used, check failure conditions, diffs, and unverified items as the last safety layer.

As AI agents increase,
human work does not simply disappear.
It shifts toward designing evaluation criteria and context.

Source Ideas Referenced in This Article

This note integrates source ideas around AI evaluation, context optimization, CLAUDE.md, multi-source context, multi-agent design, and development-support OSS.

  1. AI evaluation becoming a new compute bottleneck
  2. CLAUDE.md as a way to improve coding-agent ability
  3. Agency Agents and multi-persona agent frameworks
  4. Contextualizing agents across multiple data sources