Compositional execution-state management for long-horizon agentic code generation.
Agent Harness provides a robust framework for managing execution state in long-horizon agentic code generation loops. It tracks data dependencies across code steps, intelligently detects which steps need re-execution after feedback, generates targeted patches, and replays only the affected subgraph—enabling efficient iterative refinement of generated code.
Built for AI agent systems that must handle stateful, multi-turn code synthesis with automatic error recovery and deterministic replay semantics. Integrates seamlessly with LLM-based feedback loops, AST analysis, and execution sandboxes.
pip install agent-harnessfrom agent_harness.esg import ExecutionStateGraph
from agent_harness.executor import Executor
from agent_harness.feedback_interpreter import diagnose
from agent_harness.patch_generator import generate_patch
# Create an execution state graph
esg = ExecutionStateGraph()
# Add and execute steps
step_id = esg.add_step("step_0", "x = 5\ny = x + 3")
executor = Executor(namespace={})
result = executor.execute(step_id, "x = 5\ny = x + 3")
# Record outputs and track dependencies
esg.record_output(step_id, result.namespace)
# Add dependent steps with automatic edge detection
step_id_2 = esg.add_step("step_1", "z = y * 2")
esg.add_edge("step_0", "step_1")
# Diagnose failures and generate patches
diagnosis = diagnose(result, esg)
patch = generate_patch(diagnosis, esg)
# Replay affected subgraph
stale_steps = esg.mark_stale("step_0")
replay_order = esg.replay_from("step_0")Automatically infer read/write dependencies across code steps using AST analysis. Track which variables each step consumes and produces.
from agent_harness.dependency_analyzer import get_reads, get_writes, infer_edges
reads = get_reads("z = y * 2") # {'y'}
writes = get_writes("x = 5; y = x + 3") # {'x', 'y'}
steps = [("step_0", "x = 5"), ("step_1", "y = x + 3")]
edges = infer_edges(steps) # [('step_0', 'step_1')]Query ancestor steps needed to recompute a variable, mark stale steps, and replay only affected subgraphs.
ancestors = esg.get_ancestors("step_2", {"x"})
stale = esg.mark_stale("step_0")
subgraph = esg.get_subgraph_from("step_1")Diagnose execution failures and generate targeted code patches from error feedback.
diagnosis = diagnose(step_result, esg)
patch = generate_patch(diagnosis, esg)Agent Harness organizes execution state as a directed acyclic graph (DAG) where each node represents a code step and edges represent data dependencies.
ExecutionStateGraph (esg.py) manages the DAG, tracking step metadata, output namespaces, and staleness. DependencyAnalyzer (dependency_analyzer.py) uses AST introspection to infer variable read/write patterns and automatically detect dependencies. Executor (executor.py) runs steps in isolated namespaces and captures results. FeedbackInterpreter (feedback_interpreter.py) analyzes execution failures to produce diagnostic reports. PatchGenerator (patch_generator.py) synthesizes code patches from diagnoses.
The flow: steps are added → dependencies inferred → steps executed → outputs recorded → feedback triggers diagnosis → patches generated → staleness propagated → subgraph replayed.
esg = ExecutionStateGraph()
step_id = esg.add_step(step_id: str, source: str) -> str
esg.record_output(step_id: str, namespace: dict)
esg.add_edge(src_step_id: str, dst_step_id: str) -> None
ancestors = esg.get_ancestors(step_id: str, variable_names: set[str] | None = None) -> list[str]
subgraph = esg.get_subgraph_from(step_id: str) -> list[str]
esg.mark_stale(step_id: str) -> None
replay_order = esg.replay_from(step_id: str) -> list[str]
node = esg.get_node(step_id: str) -> dict[str, Any]
graph = esg.graph() -> nx.DiGraphexecutor = Executor(namespace={})
result = executor.execute(step_id: str, source: str) -> StepResult
ns = result.namespace # dict[str, Any]diagnosis = diagnose(step_result: StepResult, esg: ExecutionStateGraph) -> DiagnosisReport
patch = generate_patch(diagnosis: DiagnosisReport, esg: ExecutionStateGraph) -> PatchResultfrom agent_harness.dependency_analyzer import get_reads, get_writes, infer_edges
reads = get_reads(source: str) -> set[str]
writes = get_writes(source: str) -> set[str]
edges = infer_edges(steps: list[tuple[str, str]]) -> list[tuple[str, str]]Agent Harness includes 268 comprehensive tests. Run the full suite with:
pytest tests/Contributions welcome! Please open issues and pull requests on GitHub. See docs at lumi-node.github.io/agent-harness.
MIT License — see LICENSE for details.
