Agent Harness

Compositional execution-state management for long-horizon agentic code generation.

Agent Harness provides a robust framework for managing execution state in long-horizon agentic code generation loops. It tracks data dependencies across code steps, intelligently detects which steps need re-execution after feedback, generates targeted patches, and replays only the affected subgraph—enabling efficient iterative refinement of generated code.

Built for AI agent systems that must handle stateful, multi-turn code synthesis with automatic error recovery and deterministic replay semantics. Integrates seamlessly with LLM-based feedback loops, AST analysis, and execution sandboxes.

Quick Start

pip install agent-harness

from agent_harness.esg import ExecutionStateGraph
from agent_harness.executor import Executor
from agent_harness.feedback_interpreter import diagnose
from agent_harness.patch_generator import generate_patch

# Create an execution state graph
esg = ExecutionStateGraph()

# Add and execute steps
step_id = esg.add_step("step_0", "x = 5\ny = x + 3")
executor = Executor(namespace={})
result = executor.execute(step_id, "x = 5\ny = x + 3")

# Record outputs and track dependencies
esg.record_output(step_id, result.namespace)

# Add dependent steps with automatic edge detection
step_id_2 = esg.add_step("step_1", "z = y * 2")
esg.add_edge("step_0", "step_1")

# Diagnose failures and generate patches
diagnosis = diagnose(result, esg)
patch = generate_patch(diagnosis, esg)

# Replay affected subgraph
stale_steps = esg.mark_stale("step_0")
replay_order = esg.replay_from("step_0")

What Can You Do?

Dependency Tracking

Automatically infer read/write dependencies across code steps using AST analysis. Track which variables each step consumes and produces.

from agent_harness.dependency_analyzer import get_reads, get_writes, infer_edges

reads = get_reads("z = y * 2")  # {'y'}
writes = get_writes("x = 5; y = x + 3")  # {'x', 'y'}

steps = [("step_0", "x = 5"), ("step_1", "y = x + 3")]
edges = infer_edges(steps)  # [('step_0', 'step_1')]

Intelligent Re-execution

Query ancestor steps needed to recompute a variable, mark stale steps, and replay only affected subgraphs.

ancestors = esg.get_ancestors("step_2", {"x"})
stale = esg.mark_stale("step_0")
subgraph = esg.get_subgraph_from("step_1")

Feedback-Driven Patching

Diagnose execution failures and generate targeted code patches from error feedback.

diagnosis = diagnose(step_result, esg)
patch = generate_patch(diagnosis, esg)

Architecture

Agent Harness organizes execution state as a directed acyclic graph (DAG) where each node represents a code step and edges represent data dependencies.

ExecutionStateGraph (esg.py) manages the DAG, tracking step metadata, output namespaces, and staleness. DependencyAnalyzer (dependency_analyzer.py) uses AST introspection to infer variable read/write patterns and automatically detect dependencies. Executor (executor.py) runs steps in isolated namespaces and captures results. FeedbackInterpreter (feedback_interpreter.py) analyzes execution failures to produce diagnostic reports. PatchGenerator (patch_generator.py) synthesizes code patches from diagnoses.

The flow: steps are added → dependencies inferred → steps executed → outputs recorded → feedback triggers diagnosis → patches generated → staleness propagated → subgraph replayed.

API Reference

ExecutionStateGraph

esg = ExecutionStateGraph()
step_id = esg.add_step(step_id: str, source: str) -> str
esg.record_output(step_id: str, namespace: dict)
esg.add_edge(src_step_id: str, dst_step_id: str) -> None
ancestors = esg.get_ancestors(step_id: str, variable_names: set[str] | None = None) -> list[str]
subgraph = esg.get_subgraph_from(step_id: str) -> list[str]
esg.mark_stale(step_id: str) -> None
replay_order = esg.replay_from(step_id: str) -> list[str]
node = esg.get_node(step_id: str) -> dict[str, Any]
graph = esg.graph() -> nx.DiGraph

Executor

executor = Executor(namespace={})
result = executor.execute(step_id: str, source: str) -> StepResult
ns = result.namespace  # dict[str, Any]

Feedback & Patches

diagnosis = diagnose(step_result: StepResult, esg: ExecutionStateGraph) -> DiagnosisReport
patch = generate_patch(diagnosis: DiagnosisReport, esg: ExecutionStateGraph) -> PatchResult

Dependency Analysis

from agent_harness.dependency_analyzer import get_reads, get_writes, infer_edges

reads = get_reads(source: str) -> set[str]
writes = get_writes(source: str) -> set[str]
edges = infer_edges(steps: list[tuple[str, str]]) -> list[tuple[str, str]]

Testing

Agent Harness includes 268 comprehensive tests. Run the full suite with:

pytest tests/

Contributing

Contributions welcome! Please open issues and pull requests on GitHub. See docs at lumi-node.github.io/agent-harness.

License

MIT License — see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
docs		docs
paper		paper
src/agent_harness		src/agent_harness
tests		tests
.gitignore		.gitignore
.opus_build.log		.opus_build.log
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
journal-agent-harness.mdx		journal-agent-harness.mdx
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agent Harness

Quick Start

What Can You Do?

Dependency Tracking

Intelligent Re-execution

Feedback-Driven Patching

Architecture

API Reference

ExecutionStateGraph

Executor

Feedback & Patches

Dependency Analysis

Testing

Contributing

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Agent Harness

Quick Start

What Can You Do?

Dependency Tracking

Intelligent Re-execution

Feedback-Driven Patching

Architecture

API Reference

ExecutionStateGraph

Executor

Feedback & Patches

Dependency Analysis

Testing

Contributing

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages