Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
eda68b2
feat: add action agent pipeline baseline
skywhite1024 Jun 13, 2026
2f68e9d
change config and image root
skywhite1024 Jun 14, 2026
64b04e2
update dexsim0.4.1
skywhite1024 Jun 14, 2026
adb29c8
fix demo1 basket
skywhite1024 Jun 14, 2026
f13f481
fix: normalize mesh frame generation
skywhite1024 Jun 14, 2026
0957ecb
fix temp glb 90 but without material
skywhite1024 Jun 14, 2026
c2f4427
fix normalizer glb
skywhite1024 Jun 14, 2026
ccea47d
conda activate base
skywhite1024 Jun 14, 2026
3a1109d
direction right
skywhite1024 Jun 15, 2026
52d9a75
fix one object error and robot high
skywhite1024 Jun 16, 2026
7dc0d84
fix affoardance
skywhite1024 Jun 16, 2026
fb473a2
fix: tighten action-agent atomic runtime schema
skywhite1024 Jun 16, 2026
93ea3ea
fix front and back
skywhite1024 Jun 16, 2026
7807d67
fix lower -> open hand -> retreat
skywhite1024 Jun 16, 2026
c607e30
fix: address action-agent runtime review cleanup
skywhite1024 Jun 17, 2026
7b15271
Fix action agent CoACD cache reuse
skywhite1024 Jun 17, 2026
063254c
fix: address action-agent review cleanup
skywhite1024 Jun 24, 2026
bdd24df
Add line arrangement config generation
skywhite1024 Jun 24, 2026
e75e3f2
Fix dual UR5 robot view semantics
skywhite1024 Jun 26, 2026
adcd3ba
fix: adapt action-agent runtime to typed atomic actions
skywhite1024 Jun 26, 2026
c5a23a7
style: format action-agent config generation test
skywhite1024 Jun 26, 2026
0daf471
fix: pass grasp mesh data to typed affordance
skywhite1024 Jun 26, 2026
557ab2f
Native action-agent atomic actions
skywhite1024 Jun 26, 2026
59e5100
Native relative pose-sensitive release flow
skywhite1024 Jun 26, 2026
f75c295
Fix pose-sensitive relative release height
skywhite1024 Jun 27, 2026
6ca279b
Use object-pose release for on placements
skywhite1024 Jun 27, 2026
ed4ad8b
improve arrangement_spec
skywhite1024 Jun 27, 2026
4733e16
fix Camera high
skywhite1024 Jun 27, 2026
f803dbc
Object Manipulation update
skywhite1024 Jun 27, 2026
21b45a8
fix Object Manipulation bug
skywhite1024 Jun 27, 2026
5467036
change ur solver
skywhite1024 Jun 28, 2026
bbc85e7
fix(sim): export UR solver module API
skywhite1024 Jun 28, 2026
f35defd
delete old action agent
skywhite1024 Jun 28, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
include VERSION
recursive-include configs/ *
recursive-include embodichain/gen_sim/action_agent_pipeline/generation/templates *.json
39 changes: 0 additions & 39 deletions docs/source/api_reference/embodichain/embodichain.agents.rst

This file was deleted.

11 changes: 2 additions & 9 deletions docs/source/api_reference/embodichain/embodichain.lab.sim.rst
Original file line number Diff line number Diff line change
Expand Up @@ -78,14 +78,6 @@ Shapes
:show-inheritance:
:exclude-members: __init__, copy, replace, to_dict, validate

Atomic Actions
--------------

.. automodule:: embodichain.lab.sim.atom_actions
:members:
:undoc-members:
:show-inheritance:

Objects
-------

Expand Down Expand Up @@ -133,6 +125,7 @@ Atomic Actions
:maxdepth: 1

embodichain.lab.sim.atomic_actions

Shared Types
------------

Expand All @@ -147,4 +140,4 @@ Utility
.. toctree::
:maxdepth: 1

embodichain.lab.sim.utility
embodichain.lab.sim.utility
1 change: 0 additions & 1 deletion docs/source/api_reference/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,6 @@ The following modules are available in the core ``embodichain`` framework:
.. autosummary::
:toctree: embodichain

agents
data
data_pipeline
lab
Expand Down
197 changes: 45 additions & 152 deletions docs/source/features/generative_sim/agents.md
Original file line number Diff line number Diff line change
@@ -1,175 +1,68 @@
# EmbodiAgent(aborted)
# Action Agent Pipeline

EmbodiAgent is a hierarchical multi-agent system that enables robots to perform complex manipulation tasks through closed-loop planning, code generation, and validation. The system combines vision-language models (VLMs) and large language models (LLMs) to translate high-level goals into executable robot actions.
The action-agent pipeline is the supported agent workflow for generated tabletop
manipulation tasks. It converts an image or an existing generated gym project
into a task-specific simulation config, asks the task model for a JSON task
graph, compiles that graph into atomic-action specs, and executes it through the
`AtomicActionsAgent-v3` environment.

## Quick Start
The legacy Python-code generation agent stack has been removed. New demos and
task generation should use the modules under
`embodichain.gen_sim.action_agent_pipeline`.

### Prerequisites
Ensure you have access to Azure OpenAI or a compatible LLM endpoint.
## End-to-end Pipeline

```bash
# Set environment variables
export AZURE_OPENAI_ENDPOINT="https://your-endpoint.openai.azure.com/"
export AZURE_OPENAI_API_KEY="your-api-key"
```

### Using Different LLM/VLM APIs
Run image-to-scene, config generation, and agent execution in one command:

The system uses LangChain's `AzureChatOpenAI` by default. To use different LLM/VLM providers, you can modify the `create_llm` function in `embodichain/agents/hierarchy/llm.py`.

#### Azure OpenAI
```bash
export AZURE_OPENAI_ENDPOINT="https://your-endpoint.openai.azure.com/"
export AZURE_OPENAI_API_KEY="your-api-key"
export OPENAI_API_VERSION="2024-10-21" # Optional, defaults to "2024-10-21"
python -m embodichain.gen_sim.action_agent_pipeline.cli.run_agent_pipeline \
--use-image2scene \
--server "http://127.0.0.1:4523" \
--image-name "demo1" \
--task_description "Pick up the target object and place it in the basket." \
--config-output-dir "gym_project/action_agent_pipeline/configs/demo1_text" \
--task_name "Demo1_Text" \
--target_body_scale 0.8 \
--regenerate
```

#### OpenAI
To use OpenAI directly instead of Azure, modify `llm.py`:
```python
from langchain_openai import ChatOpenAI
## Generate Config Only

def create_llm(*, temperature=0.0, model="gpt-4o"):
return ChatOpenAI(
temperature=temperature,
model=model,
api_key=os.getenv("OPENAI_API_KEY"),
)
```
Use an existing gym project to generate the task config and agent config:

Then set:
```bash
export OPENAI_API_KEY="your-api-key"
```

#### Other Providers
You can use other LangChain-compatible providers by modifying the `create_llm` function, for example:

**Anthropic Claude:**
```python
from langchain_anthropic import ChatAnthropic

def create_llm(*, temperature=0.0, model="claude-3-opus-20240229"):
return ChatAnthropic(
temperature=temperature,
model=model,
anthropic_api_key=os.getenv("ANTHROPIC_API_KEY"),
)
python -m embodichain.gen_sim.action_agent_pipeline.cli.generate_action_agent_config \
--gym_project "gym_project/environment/image2tabletop/downloads/example_gym_project" \
--output_dir "gym_project/action_agent_pipeline/configs/demo_text" \
--task_name "Demo_Text" \
--task_description "Pick up the target object and place it in the basket." \
--target_body_scale 0.8 \
--overwrite
```

**Google Gemini:**
```python
from langchain_google_genai import ChatGoogleGenerativeAI
## Run Generated Config

def create_llm(*, temperature=0.0, model="gemini-pro"):
return ChatGoogleGenerativeAI(
temperature=temperature,
model=model,
google_api_key=os.getenv("GOOGLE_API_KEY"),
)
```

### Run the System

Run the agent system with the following command:
Run a previously generated config with the action-agent environment:

```bash
python embodichain/lab/scripts/run_agent.py \
--task_name YourTask \
--gym_config configs/gym/your_task/gym_config.yaml \
--agent_config configs/gym/agent/your_agent/agent_config.json \
--regenerate False
python -m embodichain.gen_sim.action_agent_pipeline.cli.run_agent \
--task_name "Demo_Text" \
--gym_config "gym_project/action_agent_pipeline/configs/demo_text/fast_gym_config.json" \
--agent_config "gym_project/action_agent_pipeline/configs/demo_text/agent_config.json" \
--regenerate
```

**Parameters:**
- `--task_name`: Name identifier for the task
- `--gym_config`: Path to the gym environment configuration file (``.json``, ``.yaml``, or ``.yml``)
- `--agent_config`: Path to the agent configuration file (defines prompts and agent behavior)
- `--regenerate`: If `True`, forces regeneration of plans/code even if cached

## System Architecture

The system operates on a closed-loop control cycle:

- **Observe**: The `TaskAgent` perceives the environment via multi-view camera inputs.
- **Plan**: It decomposes the goal into natural language steps.
- **Code**: The `CodeAgent` translates steps into executable Python code using atomic actions.
- **Execute**: The code runs in the environment; runtime errors are caught immediately.
- **Validate**: The `ValidationAgent` analyzes the result images, selects the best camera angle, and judges success.
- **Refine**: If validation fails, feedback is sent back to the agents to regenerate the plan or code.

---

## Core Components

### TaskAgent
*Located in:* `embodichain/agents/hierarchy/task_agent.py`

Responsible for high-level reasoning. It parses visual observations and outputs a structured plan.

* For every step, it generates a specific condition (e.g., "The cup must be held by the gripper") which is used later by the ValidationAgent.
* Prompt Strategies:
* `one_stage_prompt`: Direct VLM-to-Plan generation.
* `two_stage_prompt`: Separates visual analysis from planning logic.

### CodeAgent
*Located in:* `embodichain/agents/hierarchy/code_agent.py`

Translates natural language plans into executable Python code using atomic actions from the action bank.

* Generates Python code that follows strict coding guidelines (no loops, only provided APIs)
* Executes code in a sandboxed environment with immediate error detection
* Uses Abstract Syntax Tree (AST) parsing to ensure code safety and correctness
* Supports few-shot learning through code examples in the configuration


### ValidationAgent
*Located in:* `embodichain/agents/hierarchy/validation_agent.py`

Closes the loop by verifying if the robot actually achieved what it planned.

* Uses a specialized LLM call (`select_best_view_dir`) to analyze images from all cameras and pick the single best angle that proves the action's outcome, ignoring irrelevant views.
* If an error occurs (runtime or logic), it generates a detailed explanation which is fed back to the `TaskAgent` or `CodeAgent` for the next attempt.

---

## Configuration Guide

The `Agent` configuration block controls the context provided to the LLMs. Prompt files are resolved in the following order:

1. **Config directory**: Task-specific prompt files in the same directory as the agent configuration file (e.g., `configs/gym/agent/pour_water_agent/`)
2. **Default prompts directory**: Reusable prompt templates in `embodichain/agents/prompts/`

| Parameter | Description | Typical Use |
| :--- | :--- | :--- |
| `task_prompt` | Task-specific goal description | "Pour water from the red cup to the blue cup." |
| `basic_background` | Physical rules & constraints | World coordinate system definitions, safety rules. |
| `atom_actions` | API Documentation | List of available functions (e.g., `drive(action='pick', ...)`). |
| `code_prompt` | Coding guidelines | "Use provided APIs only. Do not use loops." |
| `code_example` | Few-shot examples | Previous successful code snippets to guide style. |

---

## File Structure

```text
embodichain/agents/
├── hierarchy/
│ ├── agent_base.py # Abstract base handling prompts & images
│ ├── task_agent.py # Plan generation logic
│ ├── code_agent.py # Code generation & AST execution engine
│ ├── validation_agent.py # Visual analysis & view selection
│ └── llm.py # LLM configuration and instances
├── mllm/
│ └── prompt/ # Prompt templates (LangChain)
└── prompts/ # Agent prompt templates
```
## Runtime Shape

---
- `TaskAgent` produces a deterministic JSON graph.
- `CompileAgent` caches and validates the graph artifact.
- `AgenticGenSimEnv` registers `AtomicActionsAgent-v3` and exposes
`create_demo_action_list()`.
- Runtime graph execution calls atomic actions from
`embodichain.gen_sim.action_agent_pipeline.runtime`.

## See Also

- [Online Data Streaming](../online_data.md) — Streaming live simulation data for training
- [RL Architecture](../../overview/rl/index.rst) — RL training pipeline and algorithms
- [Atomic Actions Tutorial](../../tutorial/atomic_actions.rst) — Action primitives used by the CodeAgent
- [SimReady Asset Pipeline](simready_pipeline.md) — Generating simulation-ready assets
- [Atomic Actions Tutorial](../../tutorial/atomic_actions.rst) — Atomic action primitives
- [Supported Tasks](../../resources/task/index.rst) — Available task environments
1 change: 1 addition & 0 deletions docs/source/features/generative_sim/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,5 @@ Generative Simulation collects EmbodiChain features for generating simulation-re
.. toctree::
:maxdepth: 2

Action Agent Pipeline <agents.md>
SimReady Asset Pipeline <simready_pipeline.md>
5 changes: 2 additions & 3 deletions embodichain/agents/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@
# limitations under the License.
# ----------------------------------------------------------------------------

from . import hierarchy
from . import mllm
from __future__ import annotations

__all__ = ["hierarchy", "mllm"]
__all__: list[str] = []
Loading
Loading