Practitioner Guide

Agent Architecture Patterns

Agent Anatomy

Every agent, regardless of framework or vendor, is composed of five core components.

Component	What It Does	Example
Model	The reasoning engine. Makes decisions, generates outputs, selects tools.	GPT-5.4, Claude Opus 4.6, Gemini 3 Pro
Tools	External functions, APIs, or systems the agent can invoke.	Database queries, web search, code execution, file I/O
Instructions	Explicit guidelines, scope constraints, and behavioral rules.	System prompts, AGENTS.md, rules files, policy documents
Memory	Context that persists across interactions — short-term (conversation) and long-term (vector stores, key-value).	Conversation history, session state, project context
Retrieval	Mechanisms to access external knowledge not in the model’s training data.	RAG pipelines, document search, knowledge bases

These five components are the atoms. Everything else — workflows, agents, multi-agent systems — is a molecule built from them.

Workflows vs. Agents

A critical architectural distinction, first articulated by Anthropic, governs every design decision downstream:

Dimension	Workflows	Agents
Control	Predefined code paths orchestrate the LLM	LLM dynamically directs its own processes
Predictability	High — you know the execution path	Lower — the model decides what to do next
Flexibility	Low — changes require code changes	High — adapts to novel inputs
Best for	Well-defined, repeatable tasks	Open-ended problems with unpredictable steps
Cost/latency	Lower — fewer LLM calls, fixed paths	Higher — more calls, dynamic routing
Error handling	Programmatic gates and checks	Agent must self-correct or escalate

The decision rule: use workflows when you can define the task decomposition in advance; use agents when the decomposition depends on the input.

Tool Types

Tools fall into three types. This classification determines how you design, test, and permission agent capabilities:

Data tools — Retrieve context and information. Read-only, low-risk. Examples: Query databases, read documents, search the web, pull CRM records.

Action tools — Interact with systems to change state. Write operations, higher risk. Examples: Send emails, update records, create tickets, issue refunds, deploy code.

Orchestration tools — Other agents exposed as tools. Meta-level coordination. Examples: A “research agent” callable by a “manager agent,” a specialist agent invoked by a triage agent.

Risk rating should be assigned per tool: low (read-only), medium (reversible writes), high (irreversible writes, financial impact, external-facing). These ratings feed directly into the Guardrail Stack.

Composition Patterns

Eight patterns, ordered from simplest to most complex. The rule: always start at the top and move down only when the simpler pattern demonstrably fails.

Pattern 1

Prompt Chaining

Decompose a task into a fixed sequence of steps. Each LLM call processes the output of the previous one. Programmatic gates between steps validate intermediate results.

When to use

Task can be cleanly decomposed into fixed subtasks. You trade latency for accuracy by making each call simpler.

Example

Generate marketing copy, then translate it. Write an outline, validate it against criteria, then write the full document.

Pattern Selection Guide

Always start at the top. Move down only when the simpler pattern demonstrably fails.

Start Single LLM call with good prompting

Output quality insufficient

Consider Prompt Chaining or Evaluator-Optimizer

Start Prompt Chaining

Task decomposition isn't fixed

Consider Single Agent Loop

Start Single Agent Loop

Too many tools (>15) or overlapping concerns

Consider Manager or Orchestrator-Workers

Start Single Agent Loop

Distinct categories with different handling

Consider Routing + specialized agents

Start Manager pattern

Central agent bottlenecks; specialists need full autonomy

Consider Decentralized Handoff

Universal rule: Maximize a single agent's capabilities before splitting into multiple agents.

The Guardrail Stack

Five layers of defense, grounded in Principle 4: Guardrails Are Non-Negotiable. Each layer addresses a different failure mode. All five must be in place before scaling agent operations.

Fleet-level controls for managing agents at organizational scale.

Element	Description
Registry	Single source of truth tracking all agents, their capabilities, owners, and status
Access control	Role-based permissions determining which agents can access which systems and data
Observability	Unified monitoring across all agents — execution traces, cost tracking, error rates, latency
Interoperability	Standards for agents to work across platforms and teams (e.g., Model Context Protocol)
Audit trail	Complete record of agent actions, decisions, and outcomes for compliance and debugging
Cost budgeting	Per-agent and per-team token/compute budgets with alerts and hard limits

Ensure humans retain authority over decisions that agents must not make autonomously.

Element	Description
Architecture	System design, technology choices, data model changes
Risk acceptance	Shipping known tradeoffs, accepting technical debt
Release timing	When code goes to production
Incident response	Rollback decisions, postmortem actions
Security-critical changes	Authentication, authorization, encryption
Cost commitments	Actions with financial impact above defined thresholds

Enforce safety, compliance, and organizational rules that automated quality checks cannot catch.

Element	Description	Example
No secret exposure	Automated secret scanning in pre-commit and CI	Credentials leaking into repositories
PII filtering	LLM-based or regex-based PII detection on outputs	Privacy violations in generated content
Safety classification	Detect prompt injection, jailbreak attempts	System exploitation
Relevance classification	Flag off-topic or out-of-scope agent behavior	Scope drift and waste
Moderation	Content safety checks on agent outputs	Harmful or inappropriate generated content
Dependency policy	Block unsafe dependency upgrades or additions	Supply chain attacks
Branch policy	No direct pushes to main/protected branches	Unreviewed code reaching production

Enforce code and output correctness through automated checks before any human review.

Element	Description
Formatting & linting	Enforce style consistency (Black, ESLint, Prettier, etc.)
Type checking	Static type verification (mypy, TypeScript strict mode)
Unit & integration tests	Existing test suite must pass; new code must include tests
Static analysis	Security scanning, dependency vulnerability checks
Coverage thresholds	No regressions in test coverage
Design system compliance	Agent-generated UI follows the component library and design tokens
Accessibility standards	WCAG compliance checks on generated interfaces

Prevent agents from drifting beyond their assigned task — the most common failure mode in practice.

Element	Description	Example
Target	Specific files, directories, or systems the agent may touch	src/api/users/, payments_table
Non-goals	What the agent must NOT change	"Do not modify authentication logic"
Acceptance criteria	Concrete definition of "done"	"All tests pass, endpoint returns 200 with valid payload"
Allowed dependencies	What the agent may import or call	"No new external packages without approval"
Max iterations	Upper bound on agent execution cycles	20 tool calls, 10 minutes wall time

The Operating Loop

The Plan-Execute-Verify-Ship-Learn Cycle

This is the day-to-day execution model. It replaces ad-hoc prompting with a loop you can actually repeat and improve.

Plan Product + Engineering

Before any agent touches code, define the contract that bounds execution.

Product defines: Goal, acceptance criteria, UX requirements
Engineering defines: Scope, non-goals, risk level, constraints, verification method
The plan is not a suggestion — it is the contract that bounds agent execution
Without it, you get creative drift

Task Classification Matrix

Not every task is equally suited for agent execution. Classify tasks along two dimensions: boundedness (how well-defined is the scope?) and risk (what is the blast radius if the agent gets it wrong?).

Low Risk

Medium Risk

High Risk

Well-bounded

Semi-bounded

Open-ended

Agent-driven Well-bounded / Low Risk

Automated verification. Sampling review.

Engineering

API endpoints and CRUD features
Code formatting, linting, and style fixes
Documentation and changelog generation

Product

Copy and microcopy generation within brand guidelines
Test case generation from acceptance criteria
Competitive analysis summaries from public data

The Human-Agent Boundary

The boundary is defined by a simple question: “If the agent gets this wrong, what happens?”

If the answer is “CI catches it and the PR is rejected” -> agent can execute autonomously
If the answer is “a customer sees wrong data” -> agent drafts, human verifies
If the answer is “we have a security breach” -> human leads, agent assists at most
If the answer is “we don’t know” -> human leads until we do know

The boundary moves over time. As guardrails improve, evaluation coverage increases, and confidence grows, tasks shift from human-led to agent-assisted to agent-driven. But you earn that movement through demonstrated reliability. You don’t assume it.

For how each traditional role transforms under these boundaries — from Software Engineer to Product Designer — see Roles in the AI Era. The Competency Evolution Explorer maps which competencies carry over, which are new, and which to sunset per role.

Maturity Model

A five-level maturity model, drawing on Eledath’s “Levels of Agentic Engineering” framework and extending it with team-level dimensions, product integration, and failure modes. Each level says where a team is, what it can do, what it needs to move up, and what risks come with the territory.

This is the canonical maturity model for the entire HELM framework. The Leadership Guide references this model and provides a quick-reference table for leadership conversations.

Level 1

Assisted

AI provides suggestions that developers accept, modify, or reject. The developer drives all decisions and execution.

"We use Copilot for suggestions"

Dimensions

Capabilities Tab completion, inline suggestions, single-turn Q&A, code explanation

Tools GitHub Copilot, ChatGPT, basic AI-assisted IDE features

Human role Full control. AI is a passive tool.

Agent autonomy None. Every output requires explicit human action.

Product dimension PMs and designers use AI for ad-hoc tasks (drafting docs, brainstorming). No integration with engineering workflows.

Risk profile Low. Developer reviews every suggestion.

Assessment Criteria

Developers use AI for suggestions but control all execution
No structured prompting or context engineering
No shared rules or templates for AI usage
AI usage is individual, not team-standardized

Failure mode Over-trust of suggestions without review; cargo-cult coding.

Critical reminder: The team’s effective level equals the level of the lowest-capability person in a critical-path role, not the highest-capability individual. A team with one Level 4 engineer and a Level 1 gatekeeper operates at Level 1.

Agent Architecture Patterns

Agent Anatomy

Workflows vs. Agents

Tool Types

Composition Patterns

Prompt Chaining

When to use

Example

Routing

When to use

Example

Parallelization

When to use

Example

Evaluator-Optimizer

When to use

Example

Single Agent Loop

When to use

Example

Orchestrator-Workers

When to use

Example

Manager (Agents-as-Tools)

When to use

Example

Decentralized Handoff

When to use

Example

Pattern Selection Guide

The Guardrail Stack

The Operating Loop

The Plan-Execute-Verify-Ship-Learn Cycle

Task Classification Matrix

Engineering

Product

Engineering

Product

Engineering

Product

Engineering

Product

Engineering

Product

Engineering

Product

Engineering

Product

Engineering

Product

Engineering

Product

The Human-Agent Boundary

Maturity Model

Assisted

Dimensions

Assessment Criteria

Structured

Dimensions

Assessment Criteria

Integrated

Dimensions

Assessment Criteria

Autonomous

Dimensions

Assessment Criteria

Orchestrated

Dimensions

Assessment Criteria

Further Reading