Skip to content

docs: add python / matplotlib requirement example#1293

Merged
markstur merged 13 commits into
generative-computing:mainfrom
akihikokuroda:issue1025
Jun 26, 2026
Merged

docs: add python / matplotlib requirement example#1293
markstur merged 13 commits into
generative-computing:mainfrom
akihikokuroda:issue1025

Conversation

@akihikokuroda

@akihikokuroda akihikokuroda commented Jun 18, 2026

Copy link
Copy Markdown
Member

Pull Request

Issue

Fix: #1025

Description

Testing

  • Tests added to the respective file if code was changed
  • New code has 100% coverage if code was added
  • Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

Attribution

  • AI coding assistants used

Adding a new component, requirement, sampling strategy, or tool?

If your PR adds or modifies one of the types below, check the matching box. A checklist of type-specific review items will be posted as a comment.

  • Component
  • Requirement
  • Sampling Strategy
  • Tool

NOTE: Please ensure you have an issue that has been acknowledged by a core contributor and routed you to open a pull request against this repository. Otherwise, please open an issue before continuing with this pull request.

@akihikokuroda akihikokuroda requested a review from a team as a code owner June 18, 2026 15:10
@github-actions github-actions Bot added the documentation Improvements or additions to documentation label Jun 18, 2026
Comment thread docs/examples/requirements/code_generation_and_execution.py Outdated
Comment thread docs/examples/requirements/code_generation_and_execution.py Outdated
Comment thread docs/examples/requirements/code_generation_and_execution.py Outdated
Comment thread docs/examples/requirements/code_generation_and_execution.py Outdated

@psschwei psschwei left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few nits but otherwise LGTM

Comment thread docs/examples/requirements/code_generation_and_execution.py Outdated
Comment thread docs/examples/requirements/code_generation_and_execution.py Outdated
Comment thread docs/examples/requirements/code_generation_and_execution.py Outdated
@akihikokuroda akihikokuroda requested a review from psschwei June 18, 2026 22:23
print(f"\n ✓ Graph saved to: {graph_path}")
print(f" File size: {graph_path.stat().st_size} bytes")
else:
print(f"\n ⚠ Graph file not found at {graph_path}")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conftest passes the test when the script exits 0, and process_user_request always exits 0 — there's no sys.exit or raise when the graph isn't produced, just a print + return. So even with matplotlib installed, if the pipeline fails for any reason the test is recorded as passing whilst producing nothing. Worth raising here so failures are visible:

Suggested change
print(f"\n ⚠ Graph file not found at {graph_path}")
if not graph_path.exists():
raise RuntimeError(f"Graph was not created at {graph_path}")
print(f"\n ✓ Graph saved to: {graph_path}")
print(f" File size: {graph_path.stat().st_size} bytes")

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the earlier fix. The latest commit (2b9d4f89) reverted this back to print + return — the raise RuntimeError that was there previously is now gone. Since the conftest passes the test on exit code 0, a missing graph still goes undetected. Please restore the raise RuntimeError (or a sys.exit(1)) so that test failures surface properly.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread docs/examples/requirements/code_generation_and_execution.py
Comment thread docs/examples/requirements/README.md
@planetf1

Copy link
Copy Markdown
Contributor

Tiny one: the PR title has a typo — "requrement" should be "requirement". Worth fixing before merge as it ends up in the commit message.

Comment thread docs/examples/requirements/code_generation_and_execution.py Outdated

@planetf1 planetf1 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the example — the pipeline approach is nicely structured and the README additions are clear.

Two things I'd want addressed before merge:

  1. matplotlib guardmatplotlib isn't a project dependency so the example can't succeed on a clean checkout. The repair loop exhausts silently with no useful message to the user (see inline comment).
  2. Silent test pass — the conftest passes the test on exit 0, and process_user_request always exits 0 even when no graph is produced. The e2e test gives a false green in that case (see inline comment).

The other inline comments (execution tier caveat, README requirements section) are informational — happy for those to be addressed or tracked as follow-ups.

@akihikokuroda akihikokuroda changed the title docs: add python / matplotlib requrement example docs: add python / matplotlib requirement example Jun 19, 2026
@akihikokuroda akihikokuroda requested a review from planetf1 June 19, 2026 15:45
Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
Comment thread docs/examples/requirements/code_generation_and_execution.py Outdated
Comment thread docs/examples/requirements/code_generation_and_execution.py Outdated
Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
@akihikokuroda akihikokuroda requested a review from psschwei June 22, 2026 16:20

@psschwei psschwei left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @planetf1 has requested changes so will leave approval for him.

@markstur markstur left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see inline

Comment thread docs/examples/requirements/code_generation_and_execution.py Outdated
data = list(reader)

preview_lines = []
with open(csv_path) as f:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CSV file is opened twice: once with csv.DictReader to read all rows (lines 59–61), and again to collect preview lines (lines 63–67). This doubles the I/O cost unnecessarily.

Since csv.DictReader already reads the raw lines, both purposes can be served in a single pass.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! fixed it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was consolidated but not fixed. The file is still read twice. I guess you want one pass for raw preview and the other pass for dict. So you chose not to try single pass and deal w/ unwanted format. Makes sense.

TO FIX: The preview loop only takes < 5, but then it continues to read the rest for no reason. Add a break if >= 5.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else
  break

added.

Returns:
Extracted Python code string, or None if extraction failed.
"""
from mellea.stdlib.requirements.python_reqs import _has_python_code_listing

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not import at top?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some comments to explain.

Comment thread docs/examples/requirements/code_generation_and_execution.py

m = mellea.start_session()

output_path = f"{output_dir}/graph_{request_number}.png"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better practice: use Path to join instead of hard-coding the /

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix it .


prompt = f"""
The user has this request:
"{user_request}"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious what we can do about injection risk here.

user_request (interactive user input) and csv_preview (CSV file content that may contain adversarial data) are both interpolated unsanitized into the prompt f-string sent to the LLM. In interactive mode, a user can inject instructions that override the intended prompt, causing the model to generate arbitrary code that is then executed in a local subprocess via PythonExecutionReq(execution_tier="local").

This is especially serious because the generated code is executed as a subprocess side effect of validation.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix it. Some comments are added.

strategy = ModelFriendlyRepairStrategy(loop_budget=5, requirements=all_reqs)

print("Generating code to extract data and create graph...")
generated = m.instruct(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should check for errors here.
the error that would show up later would be misleading.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed it.

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
@akihikokuroda akihikokuroda requested a review from markstur June 24, 2026 01:23
return

generated_str = str(generated)
code = _extract_code_from_output(generated_str)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ModelOutputThunk has no __bool__, so not generated is always False — this guard never fires. The fix is to test the string value, which the code already produces one line later anyway:

Suggested change
code = _extract_code_from_output(generated_str)
generated_str = str(generated)
if not generated_str.strip():
print(" ✗ Model failed to generate output")
return
code = _extract_code_from_output(generated_str)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

Comment thread docs/examples/requirements/README.md Outdated
The underlying mechanism (what `PythonExecutionReq` does under the hood):

```python
def execute_python_code(code: str, timeout: int = 10) -> dict:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This snippet does not reflect the actual implementation — PythonExecutionReq routes through environment.execute(code) (built from your execution_tier and CapabilityPolicy), not a bare subprocess.run. The two diverge silently as the code evolves.

A prose pointer to the source would be more durable:

PythonExecutionReq executes code via an ExecutionEnvironment built from the execution_tier and CapabilityPolicy you pass — that's what enforces the timeout and tier behaviour. See mellea/stdlib/requirements/python_reqs.py for the full implementation.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

# Check if graph file was created
graph_path = Path(output_path)
if not graph_path.exists():
raise RuntimeError(f"Graph was not created at {graph_path}")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two failure modes above (empty generation, failed code extraction) handle errors gracefully with print and return. This one raises, which aborts the whole batch run on the first failure.

Given the loop budget is 5 and LLM output is probabilistic, a single failed graph should not stop the remaining examples from running. Consider matching the pattern above:

Suggested change
raise RuntimeError(f"Graph was not created at {graph_path}")
if not graph_path.exists():
print(f" ✗ Graph was not created at {graph_path}")
return

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.


# Sanitize user input and CSV preview to prevent prompt injection attacks.
# Use repr() to escape special characters and quote the values, preventing
# the model from interpreting user input as prompt instructions.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

repr() wraps the string in quotes and escapes special characters, but it does not stop the model from reading and acting on the content inside them. "Prevent prompt injection" overstates what this does — it reduces accidental prompt formatting breakage, which is still useful, but worth calling it that rather than a security control.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments are updated.

print("Generating code to extract data and create graph...")
generated = m.instruct(
prompt,
requirements=all_reqs, # type: ignore[arg-type]

@planetf1 planetf1 Jun 24, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all_reqs is already passed to ModelFriendlyRepairStrategy — when both are provided, the strategy takes precedence and the requirements= arg on instruct() is redundant. Since this is example code people will copy, it is worth being accurate: pass requirements in one place only (the strategy is the right home, since that is where repair feedback is built).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took it out.

epilog="""
Examples:
# Run with default sample data and predefined requests
python code_generation_and_execution.py

@planetf1 planetf1 Jun 24, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The project convention (AGENTS.md §1) is uv run python rather than plain python — and since this is example code people will run directly, it matters: plain python may silently pick up a different interpreter or environment. Applies to all four invocations in this epilog.

2. Extract the data as specified in the user request
3. Use matplotlib with headless backend (set matplotlib.use('Agg') at start)
4. Create the visualization (graph type) specified by the user
5. Save the graph to {output_path} using plt.savefig(\'{output_path}\')

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the backslash-escaped quotes around {output_path} are unnecessary inside a """ f-string — single quotes work fine without escaping here.

@planetf1 planetf1 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things to fix before merge:

  1. mellea[plotting] does not exist — the ImportError message points users at uv pip install mellea[plotting], which fails because there is no such extras group in pyproject.toml. The correct install is uv pip install matplotlib numpy (already in the README prerequisites).

  2. README code snippet does not match the implementation — the execute_python_code block under "How Code Execution Works" uses subprocess.run directly and omits the ExecutionEnvironment abstraction that actually enforces execution_tier and CapabilityPolicy. See inline comment for a suggested replacement.

Remaining inline comments (dead guard, RuntimeError in batch mode, repr() comment, double-wired requirements, argparse epilog) are improvements worth addressing but not hard blockers.

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
@akihikokuroda akihikokuroda requested a review from planetf1 June 24, 2026 15:11
@akihikokuroda

Copy link
Copy Markdown
Member Author

@planetf1 I addressed all your comments. Please review. Thanks!

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
@planetf1

Copy link
Copy Markdown
Contributor

LGTM — all comments addressed from my side. Leaving formal approval to reviewers who still have open threads.

@planetf1 planetf1 dismissed their stale review June 25, 2026 14:28

Dismissing changes needed review - other approvers can review and check open threads - specifically need to ensure extras are correct

@markstur markstur dismissed their stale review June 25, 2026 22:00

fixed enough to unblock

@markstur markstur left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See inline.

You should add a break instead of reading the whole file to build a preview. I guess that's not blocking but worth improving.

Otherwise I think my comments were addressed and I think Nigel's were(?)

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
@akihikokuroda

Copy link
Copy Markdown
Member Author

@markstur Nigel said

LGTM — all comments addressed from my side. Leaving formal approval to reviewers who still have open threads.

@markstur markstur dismissed planetf1’s stale review June 26, 2026 16:47

@planetf1 already said changes were lgtm and dismissed

@markstur markstur added this pull request to the merge queue Jun 26, 2026
Merged via the queue into generative-computing:main with commit b59753f Jun 26, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Tool / Requirement Examples and evaluation loop

5 participants