Skip to content

Lumi-node/agent-harness

Repository files navigation

Agent Harness

Agent Harness

Compositional execution-state management for long-horizon agentic code generation.

License Python Tests


Agent Harness provides a robust framework for managing execution state in long-horizon agentic code generation loops. It tracks data dependencies across code steps, intelligently detects which steps need re-execution after feedback, generates targeted patches, and replays only the affected subgraph—enabling efficient iterative refinement of generated code.

Built for AI agent systems that must handle stateful, multi-turn code synthesis with automatic error recovery and deterministic replay semantics. Integrates seamlessly with LLM-based feedback loops, AST analysis, and execution sandboxes.


Quick Start

pip install agent-harness
from agent_harness.esg import ExecutionStateGraph
from agent_harness.executor import Executor
from agent_harness.feedback_interpreter import diagnose
from agent_harness.patch_generator import generate_patch

# Create an execution state graph
esg = ExecutionStateGraph()

# Add and execute steps
step_id = esg.add_step("step_0", "x = 5\ny = x + 3")
executor = Executor(namespace={})
result = executor.execute(step_id, "x = 5\ny = x + 3")

# Record outputs and track dependencies
esg.record_output(step_id, result.namespace)

# Add dependent steps with automatic edge detection
step_id_2 = esg.add_step("step_1", "z = y * 2")
esg.add_edge("step_0", "step_1")

# Diagnose failures and generate patches
diagnosis = diagnose(result, esg)
patch = generate_patch(diagnosis, esg)

# Replay affected subgraph
stale_steps = esg.mark_stale("step_0")
replay_order = esg.replay_from("step_0")

What Can You Do?

Dependency Tracking

Automatically infer read/write dependencies across code steps using AST analysis. Track which variables each step consumes and produces.

from agent_harness.dependency_analyzer import get_reads, get_writes, infer_edges

reads = get_reads("z = y * 2")  # {'y'}
writes = get_writes("x = 5; y = x + 3")  # {'x', 'y'}

steps = [("step_0", "x = 5"), ("step_1", "y = x + 3")]
edges = infer_edges(steps)  # [('step_0', 'step_1')]

Intelligent Re-execution

Query ancestor steps needed to recompute a variable, mark stale steps, and replay only affected subgraphs.

ancestors = esg.get_ancestors("step_2", {"x"})
stale = esg.mark_stale("step_0")
subgraph = esg.get_subgraph_from("step_1")

Feedback-Driven Patching

Diagnose execution failures and generate targeted code patches from error feedback.

diagnosis = diagnose(step_result, esg)
patch = generate_patch(diagnosis, esg)

Architecture

Agent Harness organizes execution state as a directed acyclic graph (DAG) where each node represents a code step and edges represent data dependencies.

ExecutionStateGraph (esg.py) manages the DAG, tracking step metadata, output namespaces, and staleness. DependencyAnalyzer (dependency_analyzer.py) uses AST introspection to infer variable read/write patterns and automatically detect dependencies. Executor (executor.py) runs steps in isolated namespaces and captures results. FeedbackInterpreter (feedback_interpreter.py) analyzes execution failures to produce diagnostic reports. PatchGenerator (patch_generator.py) synthesizes code patches from diagnoses.

The flow: steps are added → dependencies inferred → steps executed → outputs recorded → feedback triggers diagnosis → patches generated → staleness propagated → subgraph replayed.

API Reference

ExecutionStateGraph

esg = ExecutionStateGraph()
step_id = esg.add_step(step_id: str, source: str) -> str
esg.record_output(step_id: str, namespace: dict)
esg.add_edge(src_step_id: str, dst_step_id: str) -> None
ancestors = esg.get_ancestors(step_id: str, variable_names: set[str] | None = None) -> list[str]
subgraph = esg.get_subgraph_from(step_id: str) -> list[str]
esg.mark_stale(step_id: str) -> None
replay_order = esg.replay_from(step_id: str) -> list[str]
node = esg.get_node(step_id: str) -> dict[str, Any]
graph = esg.graph() -> nx.DiGraph

Executor

executor = Executor(namespace={})
result = executor.execute(step_id: str, source: str) -> StepResult
ns = result.namespace  # dict[str, Any]

Feedback & Patches

diagnosis = diagnose(step_result: StepResult, esg: ExecutionStateGraph) -> DiagnosisReport
patch = generate_patch(diagnosis: DiagnosisReport, esg: ExecutionStateGraph) -> PatchResult

Dependency Analysis

from agent_harness.dependency_analyzer import get_reads, get_writes, infer_edges

reads = get_reads(source: str) -> set[str]
writes = get_writes(source: str) -> set[str]
edges = infer_edges(steps: list[tuple[str, str]]) -> list[tuple[str, str]]

Testing

Agent Harness includes 268 comprehensive tests. Run the full suite with:

pytest tests/

Contributing

Contributions welcome! Please open issues and pull requests on GitHub. See docs at lumi-node.github.io/agent-harness.

License

MIT License — see LICENSE for details.

About

Compositional execution-state management for long-horizon agentic code generation

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors