OrchestrationArchitecture

Multi-Agent Orchestration: Three Patterns That Actually Work

Priya Krishnaswamy · 2024-08-05 · 11 min read

You have a research synthesis pipeline. One agent searches the web, one reads documents, one writes the summary. You wire them together sequentially. At moderate volume, it works. Then you try to scale to 50 concurrent synthesis jobs and the whole thing stalls — the sequential chain serializes everything, and your p99 latency climbs from 40 seconds to 8 minutes. The architecture wasn't wrong for prototyping. It was wrong for production.

Multi-agent orchestration patterns exist because different workflows have fundamentally different parallelism and dependency structures. Getting the pattern wrong doesn't just hurt performance — it creates coordination failure modes that are genuinely difficult to debug once they're in production at scale.

Here are the three orchestration patterns we've seen in the vast majority of production agent systems, what each is good for, and where each breaks down.

Pattern 1: Sequential Pipeline

The simplest pattern: Agent A completes, passes output to Agent B, which completes, passes to Agent C. Each step depends on the full output of the previous step. This is the right pattern when your workflow is genuinely serial — when step B requires all of step A's output before it can meaningfully begin.

Sequential pipelines are the easiest to reason about, the easiest to trace, and the easiest to retry. If step B fails, you restart from B with A's cached output. The main failure mode isn't complexity — it's misapplication. Engineers reach for sequential pipelines when some steps could actually run in parallel, and the result is unnecessarily high latency.

The other failure mode is error propagation. In a sequential chain, a badly formatted output from step A will cascade into a failure at step B in a way that can be hard to diagnose, because the error message from B won't mention A. Explicit output schema validation at each handoff point is non-optional for production sequential pipelines.

from diaflow import Agent, Pipeline

# Sequential pipeline: each step waits for previous
pipeline = Pipeline(
    name="research-synthesis",
    steps=[
        Agent(name="search-agent",    model="claude-sonnet-4-6"),
        Agent(name="read-agent",      model="claude-sonnet-4-6"),
        Agent(name="synthesis-agent", model="claude-opus-4-7"),  # heavier model only for final
    ],
    handoff="sequential",
    validate_handoffs=True  # schema-validate output before next step
)

result = pipeline.run(query="latest APAC fintech regulatory changes Q1 2025")

# Note: code examples are illustrative —
# actual SDK usage requires a Diaflow account.

Note the model selection above: using Claude Opus 4.7 only at the synthesis step (where nuanced reasoning matters) and Claude Sonnet 4.6 at the earlier stages keeps token cost reasonable without compromising final output quality. This is a common production optimization — not every step needs the most capable model.

Pattern 2: Parallel Fan-Out with Aggregation

The pattern that resolves the latency problem: a coordinator agent spawns N worker agents simultaneously, collects their outputs, and aggregates the results. This is the right pattern when your subtasks are independent — when worker 2's execution doesn't depend on worker 1's output.

A realistic example: a due diligence agent that simultaneously runs a company financial analysis, a competitor analysis, a regulatory risk scan, and a technology stack assessment. These four sub-tasks are independent. Running them sequentially for 50 deals means 200 sequential LLM calls; running them in parallel means 50 rounds of 4 parallel calls — latency drops by roughly 3x at this workload shape.

from diaflow import Agent, Pipeline, AggregatorAgent

# Fan-out: 4 workers run in parallel, aggregator combines results
diligence_pipeline = Pipeline(
    name="due-diligence",
    steps=[
        Pipeline.fan_out(
            workers=[
                Agent(name="financial-analyst",   model="claude-sonnet-4-6"),
                Agent(name="competitor-analyst",  model="claude-sonnet-4-6"),
                Agent(name="regulatory-scanner",  model="claude-sonnet-4-6"),
                Agent(name="tech-stack-assessor", model="claude-sonnet-4-6"),
            ],
            max_concurrency=4,
            timeout_per_worker=120  # seconds — fail fast, don't hang
        ),
        AggregatorAgent(
            name="synthesis",
            model="claude-opus-4-7",
            aggregation_strategy="structured_merge"
        )
    ]
)

result = diligence_pipeline.run(company_profile=company_data)

The failure mode to watch for in parallel fan-out: partial failures. If 1 of 4 workers fails, do you fail the whole pipeline or proceed with 3 results? For some use cases (all 4 sections are mandatory for a valid report) you must fail the whole run. For others (3 sections are enough to proceed) you can tolerate partial success. Make this explicit in your pipeline config — don't let it be implicit.

Pattern 3: Hierarchical Delegation (Subagent Trees)

The most flexible and the most dangerous pattern: a supervisor agent decomposes a task and delegates subtasks to specialized worker agents, potentially recursively. Worker agents may themselves spin up sub-workers. This is the right pattern for genuinely open-ended tasks where the decomposition itself requires intelligence — you don't know in advance which sub-tasks will be needed.

A devops incident triage agent might use hierarchical delegation: the root agent receives a PagerDuty alert, decides whether to spawn a "check database metrics" worker or a "check deployment history" worker (or both) based on the alert type. Those workers may further delegate to query-specific sub-agents. The tree structure is dynamic.

We're not saying hierarchical delegation is bad — we're saying it's the hardest pattern to operate in production. The three specific problems you must solve before shipping a hierarchical agent system:

Maximum depth enforcement. Without a hard ceiling on tree depth, a poorly-worded task can cause recursive delegation that spawns hundreds of agents. Set a hard depth limit (typically 3-4 for most use cases) and enforce it, not as a soft guidance but as a hard error.
Cost attribution. When a root agent spawns workers that spawn sub-workers, your token costs become non-obvious. You need per-subtree cost tracking before you can answer "why did this single user request cost $0.85 in LLM calls?"
Partial completion semantics. If a subtree fails mid-execution, what is the state of the parent? You need explicit handling for partial completion, not just success/failure.

The Pattern Nobody Talks About: Sequential-Within-Parallel

Real production systems rarely use a single pure pattern. A common hybrid: parallel fan-out at the top level, with each parallel branch running a sequential pipeline internally. For example, a market monitoring system might fan out across 10 data sources in parallel, with each parallel branch running sequential steps of (fetch → parse → normalize → score). The outer structure is parallel; the inner structure is sequential.

This hybrid is more common than pure patterns in practice, and it's also where orchestration frameworks get complex. You need clean composition semantics — the ability to nest a sequential pipeline inside a parallel fan-out without the internal steps leaking their state into the outer context. Framework design matters here: frameworks that conflate orchestration and execution make this composition harder than it needs to be.

Choosing the Right Pattern

The decision framework is simpler than it might appear. Ask: do the sub-tasks have dependency relationships? If yes and they're linear: sequential pipeline. If yes and they're hierarchical: hierarchical delegation with depth limits. If no: parallel fan-out. If the answer is "some do, some don't": you're in hybrid territory — break the problem into phases, with each phase using the right pattern for that phase's dependency structure.

The biggest mistake we see is choosing an orchestration pattern based on what's easiest to implement rather than what matches the actual dependency structure of the problem. Sequential pipelines are easiest to write. That's not a reason to use one when your sub-tasks are actually independent.

The code examples in this post are illustrative of Diaflow SDK patterns. Actual implementation requires a Diaflow account and may differ from preview API shapes. See our documentation for current SDK reference.

More from the blog

Back to all posts