Agent Infrastructure

AI Agent Adoption Starts with
Evaluation and Context Design,
Not Tool Selection

AI agent news is often discussed through tool names and framework names. In practical adoption, however, results are shaped by what the agent can read, what counts as success, and which failures are unacceptable.

Section 01 — The Tool-First Trap

Adding tools does not make an agent learn the job

GitHub is full of OSS for code understanding, long-running agents, multi-persona agents, context integration, and prompt improvement. Each can be useful, but installing one does not automatically improve business quality.

Tool First

Start from tools

Try OSS first
Throw work at the agent
Evaluate by feeling
Fix only the prompt

System First
Start from evaluationDefine input and output
Prepare context
Write pass conditions
Return logs to operation

Delegating work to an agent does not mean giving AI freedom. It means designing context, authority, and evaluation.

Section 02 — Context Design

Agent accuracy is often decided more by accessible context than by the model itself

The move toward connecting multiple data sources shows the practical challenge of agent adoption. If internal databases, APIs, documents, GitHub, Notion, and CRM remain separated, AI tends to fall back to generalities.

Let it read static rulesPlace judgment premises such as CLAUDE.md, AGENTS.md, design policies, and operating rules.

Let it read current stateConnect the issue, PR, customer information, inventory, metrics, and latest documents needed for the current decision.

Let it read past judgmentsKeep why a design was chosen, what failed before, and which criteria were prioritized.

Section 03 — The Evaluation Bottleneck

The bottleneck in agent development becomes evaluation, not generation

AI can produce outputs quickly. That makes the human-side burden shift toward judging whether the output is correct, usable, and safe.

What To Evaluate

Things to evaluate

Factual accuracy
Scope adherence
Consistency with existing rules
Safe use of authority
Completeness as a deliverable

How To Evaluate

Where to place evaluation

Make checklists explicit
Turn failures into test cases
Do not leave human review only at the end
Separate automatic and manual evaluation
Return evaluation results to the next context

Section 04 — Role Separation

Multiple agents are not about increasing headcount; they are about separating roles and responsibility

The idea of multi-persona agents or agent teams resembles human organization design. But increasing roles alone creates confusion.

Role 1

ExplorerFind needed facts from code, documents, and data. Its responsibility is confirmation rather than judgment.

Role 2

WorkerImplement within a defined scope. Its responsibility is the request scope and existing rules, not free expansion.

Role 3

ReviewerBefore the result is used, check failure conditions, diffs, and unverified items as the last safety layer.

As AI agents increase,
human work does not simply disappear.
It shifts toward designing evaluation criteria and context.

References — Source Ideas

Source Ideas Referenced in This Article

This note integrates source ideas around AI evaluation, context optimization, CLAUDE.md, multi-source context, multi-agent design, and development-support OSS.

AI evaluation becoming a new compute bottleneck
CLAUDE.md as a way to improve coding-agent ability
Agency Agents and multi-persona agent frameworks
Contextualizing agents across multiple data sources

AI Agent Adoption Starts withEvaluation and Context Design,Not Tool Selection