feat(spec): project steering UI + cecli scan/scaffold (dev-integration)#6
feat(spec): project steering UI + cecli scan/scaffold (dev-integration)#6JessicaMulein wants to merge 64 commits into
Conversation
Submodule at 91540f39d (cherry-picked from pr/multi-repo-context): path: projects, .cecli.workspaces.yml walk-up, tests. Co-authored-by: Cursor <cursoragent@cursor.com>
…e split - Open-project gate (#50): vision-current-project, ProjectBar, e2e helpers - Multi-repo (#48 partial): .cecli.workspaces.yml, Vision API, Settings UI - Tasks aligned with open project (#49); workspace path utils - brightdate-python submodule; turn metrics; test-lab timing updates Co-authored-by: Cursor <cursoragent@cursor.com>
Submodule includes multi-repo workspaces (91540f39d) and UpdateTodoList normalize_json_array (5c79f96b5). Co-authored-by: Cursor <cursoragent@cursor.com>
Submodule at 6974c2e58: rebased local multi-repo workspaces and UpdateTodoList JSON hardening for integration testing. Co-authored-by: Cursor <cursoragent@cursor.com>
Route POST /sessions through Tauri/reqwest so WebKit fetch no longer fails with "Load failed", and harden engine spawn: wait for health, kill stale :8741 listeners, resolve install root from the binary, and align Settings paths when the repo lives under both /Users and /Volumes. Co-authored-by: Cursor <cursoragent@cursor.com>
… routing Repair /agent when Tasks prepends checklist context: rebuild slash preproc input, apply unlimited agent timeout (not VISION_SLASH_PREPROC_TIMEOUT_S), finalize with prose-shell recovery, auto-confirm, and empty-turn warnings. Route desktop WebKit POST/SSE through Tauri reqwest; free :8741 on Quit; add agent guard UI, workspace path filters, and troubleshooting notes. Co-authored-by: Cursor <cursoragent@cursor.com>
Import parse_tool_arguments from cecli.helpers.responses (not helpers.py). Pin cecli @ 383b6fd8b — Grep format_output hardening for local-model searches. Co-authored-by: Cursor <cursoragent@cursor.com>
Persist run options, stable ETC anchors, live test progress chips, and copy step logs once a step leaves pending. Add GPU stall abort, historical GPU warnings, short-circuit fail-fast, and transcript reveal in Finder. Harden LLM SSE parsing, block browser opens during suite/LLM runs, and fix e2e helpers plus Ollama warmup for llm:core. Co-authored-by: Cursor <cursoragent@cursor.com>
Show a lightning bolt for steps aborted by short-circuit, document Jun 2026 Test Lab ship notes in ROADMAP #46, and target spec-layer tabs in the wizard e2e so Tasks nav no longer steals Playwright clicks. Co-authored-by: Cursor <cursoragent@cursor.com>
Add a two-row progress header with live step elapsed time and start stamp, step-type chip icons, and Run ETC ordering fix. Speed up activate.sh for lab/vision launchers, fix agent done on continuation turns, and resolve E2E_PYTHON absolutely in suite steps and e2e helpers. Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: cecli (ollama_chat/qwen3.6:27b-q4_K_M)
Co-authored-by: cecli (ollama_chat/qwen2.5-coder:7b)
…d fix. Test Lab gains suite resume, Playwright line-reporter sub-step tracking (without pytest START noise), and orchestrator/runner reliability. Chat improves user message rendering, Cmd+F find, and applied-edit chips when turn timing is absent. E2E LLM specs tolerate long slash-command turns; SSE client guards null events. Venv launchers use ensure-venv + pytest import probe so verify:ears and Lab steps self-heal. Pin cecli @39085bc (gitignore path resolve for superproject /add). Co-authored-by: Cursor <cursoragent@cursor.com>
Add explore-and-deepen spec generation (#53), spec job debug endpoints/chips, token-limit auto-continue, spec wizard UX fixes, chat activity overlay and slash completion, and PEP 440 git-describe for pip install -e . (v0.2.1-bright5). Co-authored-by: Cursor <cursoragent@cursor.com>
Expose configurable spec-generation wall/turn timeouts in Settings and Tasks. Lean spec-focus inject on implement turns (/agent, truncated design) to avoid re-sending full specs every message. Auto-continue /agent after stalled exploration (empty Ollama, repetition on read tools) and block spurious local token-limit auto-continue. Pin cecli dev-integration (EditText LIST_PARAMS + ReadRange empty-file hint); document upstream PR workflow. Co-authored-by: Cursor <cursoragent@cursor.com>
…v0.100.5 Pin cecli at f2ad1c75c (validation pipeline + ReadRange hint, add.py staging, repomap path resolve). Chat groups tool call/args/range/result into cards with error highlighting. Document fork rebuild in CECLI_UPSTREAM_PR.md. Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Cecli v0.100.5 FileSystemService calls these on the repo facade; superproject workspaces (e.g. brightdate-rust) use bright_vision_core.RepoSet and failed POST /sessions with 400 until these methods were implemented. Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Document `brew trust --cask digital-defiance/tap/brightvision` for Homebrew 4.6+. Drop stray `>>>>>>> Stashed changes` line from the architecture table. Co-authored-by: Cursor <cursoragent@cursor.com>
Ground implement/resume turns with lib/test snapshots and flutter test verification; abort ls/repetition/read-range dead ends; default heavy keep_alive to -1 so router models stay loaded between agent LLM calls. Pin cecli dev-integration (ReadRange int marker coerce). Co-authored-by: Cursor <cursoragent@cursor.com>
Add fast/code/think routing with per-hopper think toggles and LiteLLM extra_params, env-driven tier defaults, and SSE model_route snapshots. Replace the system router chip with a colored left edge on assistant replies (with inherited route across tool breaks), fix post-tool stream deduplication, and align Settings hopper UI and tests/docs for release. Co-authored-by: Cursor <cursoragent@cursor.com>
…ntract
Pin cecli submodule to a653ce9f0 (dev-integration), which carries the
Kiro-style agent prompt rewrite + explicit edit contract and the
{final_reminders}/sub-agent-inheritance fixes (upstream PR cecli-dev/cecli#566).
Adds the BrightVision-side prompt-quality eval harness:
- bright_vision_core/agent_eval.py: objective behavioral scorer reusing the
agent_turn.py signal parsers (edit failures, ReadRange-before-edit, ls-spam,
token limit, rounds) + tests/core/test_agent_eval.py.
- bright_vision_core/agent_judge.py: opt-in LLM-as-judge rubric (scope,
directness, investigation, summary quality) with robust JSON parsing +
tests/core/test_agent_judge.py.
- tests/core/test_agent_prompt_eval.py + 'eval:prompts' script: real-Ollama
behavioral eval scoring one scoped edit turn (E2E_LLM, BV_PROMPT_JUDGE).
- docs: ROADMAP #54, TESTING 'Measuring prompt quality' section; .gitignore for
the regenerated eval workspace.
… prioritizing thinking models and ensures that thinking configurations are not overridden by stale `localStorage` values. It also includes improvements to specification documentation and E2E test reliability. **1. Model Router Enhancements (`bright_vision_core/model_router.py`)** - **Think Model Preference:** Added `prefer_think` configuration to `ModelRouterConfig` and logic to determine this preference based on pool configuration (`pool_prefers_think`). - **Env Var Overrides:** Added functions (`_parse_env_bool`, `_read_local_llm_env_bool`, `_apply_env_think_to_pool`) to force `CODE_THINK` and `FAST_THINK` settings from local environment files, bypassing potentially stale frontend settings. - **Classification Logic:** Updated `classify_prompt` to prioritize "think" models for non-tool turns when `prefer_think` is active, while continuing to enforce the use of "code" models for tool-use turns (agent commands, `/agent` slash command). **2. Specification Documentation (`bright_vision_core/spec_layers.py`, `bright_vision_core/todo_spec_generate.py`)** - **Improved Guidance:** Updated `todo_spec_generate.py` with more rigorous, professional guidelines for requirements, design, and task generation. It now emphasizes completeness, concrete examples, and adherence to specific section structures. - **Assessment Rigor:** Increased the threshold in `assess_spec_richness` for requiring more detailed requirements and acceptance criteria. **3. E2E Testing Improvements** - **Ollama Warmup:** Added a new global setup script (`e2e/global-llm-setup.ts`) to pre-warm Ollama models before test suites, preventing cold-start stalls that caused test failures. - **Resilient Assertions:** Updated E2E helpers (`e2e/helpers/llmChat.ts`, `e2e/agent-llm.spec.ts`) to use a more flexible `expectLatestAssistantSettled` approach, allowing for minor variations in model responses (like paraphrasing) rather than enforcing exact string matches. - **Environment Cleanup:** Added `clearLlmE2eWorkspaceTodos` to ensure tests start with a clean workspace, preventing auto-injected task specs from interfering with router tier assertions.
…ve session lifecycleActive includes isRunning, so the mismatch banner disabled Stop & Start whenever a session was running. Use isSessionRestartInFlight for the banner and reset process state when Start fails. Co-authored-by: Cursor <cursoragent@cursor.com>
Cherry-pick 3a84e9f37 (upstream PR cecli-dev/cecli#573). Tool cards show collapsible output for all tools with multiple result chunks. Co-authored-by: Cursor <cursoragent@cursor.com>
Wire cecli steering scan/scaffold through Vision HTTP, vision-client, and a SteeringFilesHint panel on Tasks + Spec (status, open, create template). Add mocked e2e, verify:ears/bright-core gates, and yarn test:lab in dogfood-check. Pin cecli pr/spec-development at efe29d0ed. Co-authored-by: Cursor <cursoragent@cursor.com>
|
Your Typo free trial has ended. To continue receiving code reviews, please ask your admin to upgrade to a paid plan. |
|
Important Review skippedToo many files! This PR contains 293 files, which is 143 over the limit of 150. To get a review, narrow the scope: Upgrade to a paid plan to raise the limit. ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (7)
📒 Files selected for processing (293)
You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Code Review by Qodo
1. Verify commands run via shell
|
CI Feedback 🧐A test triggered by this PR failed. Here is an AI-generated analysis of the failure:
|
PR Summary by QodoSpec-driven dev integration: steering UI + steering scan/scaffold API (cecli) WalkthroughsDescription• Add Project steering hint UI (Active/Missing) with Open and Scaffold template actions. • Expose steering scan/scaffold over Vision HTTP and typed vision-client helpers. • Pin cecli engine + extend test gates (core/ears/e2e) to cover steering workflows. Diagramgraph TD
UI["Desktop UI"] --> Hint["SteeringFilesHint"] --> Client["CoreHttpClient"] --> API["Vision HTTP API"] --> Logic["Steering scan/scaffold"] --> FS[("Workspace .cecli/")]
subgraph Legend
direction LR
_ui["UI"] ~~~ _api(["API/Client"]) ~~~ _fs[("Filesystem")]
end
High-Level AssessmentThe following are alternative approaches to this PR: 1. Desktop-only steering scaffold (no Vision HTTP)
2. Wrap cecli CLI commands from core instead of importing APIs
Recommendation: The chosen approach—Vision HTTP endpoints returning structured scan/scaffold results backed by cecli’s canonical spec implementation—best preserves cross-client parity and avoids duplication. Consider caching scan results later if filesystem scans become hot-path. File ChangesEnhancement (6)
Refactor (1)
Tests (2)
Other (3)
|
| proc = subprocess.run( | ||
| command, | ||
| shell=True, | ||
| cwd=str(Path(workspace).resolve()), | ||
| capture_output=True, | ||
| text=True, | ||
| timeout=timeout_s, | ||
| check=False, | ||
| ) | ||
| except subprocess.TimeoutExpired: | ||
| return False, f"verify command timed out after {int(timeout_s)}s: {command}" | ||
| except OSError as e: | ||
| return False, f"verify command failed to start: {e}" | ||
|
|
||
| output = ((proc.stdout or "") + (proc.stderr or "")).strip() |
There was a problem hiding this comment.
1. Verify commands run via shell 📘 Rule violation ⛨ Security
The verify gate extracts verify: strings from tasks_md and executes them via subprocess.run(..., shell=True) without validation or an allowlist, enabling command injection and arbitrary code execution in the workspace context. Because verification is enabled by default (BV_IMPLEMENT_VERIFY) and invoked automatically after steps, LLM-authored or modified tasks_md can trigger arbitrary shell commands without explicit user intent.
Agent Prompt
## Issue description
`run_verify_command()` executes a `verify:` command extracted from `tasks_md` using `subprocess.run(..., shell=True)`, and verification is enabled by default. Because the command content is effectively user-controlled (LLM/spec/task markdown) and runs automatically after an implement step when a `verify:` line exists, this creates a command-injection/arbitrary local code execution path.
## Issue Context
Compliance requirements disallow passing unsanitized input into shell/process execution paths and expect validation/sanitization and preferably allowlisting. Here, the verify command is parsed from markdown (`tasks_md`) and executed through a shell (`shell=True`), and the session flow will invoke verification automatically whenever a `verify:` entry exists for the active step.
## Fix Focus Areas
- bright_vision_core/implement_verify.py[35-56]
- bright_vision_core/implement_verify.py[62-71]
- bright_vision_core/implement_verify.py[122-150]
- bright_vision_core/session.py[892-909]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| async fn vision_api_request_json( | ||
| method: reqwest::Method, | ||
| base_url: &str, | ||
| path: &str, | ||
| bearer_token: Option<&str>, | ||
| body: Option<Value>, | ||
| timeout_secs: u64, | ||
| ) -> Result<VisionApiResponse, String> { | ||
| let base = base_url.trim().trim_end_matches('/'); | ||
| let path = path.trim_start_matches('/'); | ||
| let label = format!("{} /{path}", method); | ||
| let client = reqwest::Client::builder() | ||
| .timeout(Duration::from_secs(timeout_secs)) | ||
| .build() | ||
| .map_err(|e| e.to_string())?; | ||
| let url = format!("{base}/{path}"); | ||
| let mut req = client.request(method, &url); | ||
| if body.is_some() { | ||
| req = req.header("Content-Type", "application/json"); | ||
| } | ||
| if let Some(payload) = body { | ||
| req = req.json(&payload); | ||
| } | ||
| if let Some(token) = bearer_token { | ||
| let trimmed = token.trim(); | ||
| if !trimmed.is_empty() { | ||
| req = req.header("Authorization", format!("Bearer {}", trimmed)); | ||
| } | ||
| } | ||
| let res = req.send().await.map_err(|e| format!("{label}: {e}"))?; | ||
| let status = res.status().as_u16(); | ||
| let text = res.text().await.unwrap_or_default(); | ||
| let body = if text.trim().is_empty() { | ||
| Value::Null | ||
| } else { | ||
| serde_json::from_str(&text).unwrap_or(Value::String(text)) | ||
| }; | ||
| Ok(VisionApiResponse { status, body }) |
There was a problem hiding this comment.
2. Vision_api_fetch allows arbitrary urls 📘 Rule violation ⛨ Security
New Tauri commands accept base_url and path from the webview and issue backend reqwest requests without restricting them to the local Vision API, potentially bypassing CSP expectations and enabling SSRF-style access if the webview is compromised.
Agent Prompt
## Issue description
The new `vision_api_fetch*` Tauri commands allow the frontend to provide an arbitrary `base_url` (and `path`) that the Rust backend will request using `reqwest`. This can bypass frontend CSP/network constraints and should be constrained to the intended local Vision API target.
## Issue Context
These commands are meant as a workaround for WebKit fetch failures to localhost, but they should not become a generic network proxy.
## Fix Focus Areas
- src-tauri/src/main.rs[730-767]
- src-tauri/src/main.rs[770-796]
- src-tauri/src/main.rs[805-833]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| _UNSAFE_SHELL = re.compile( | ||
| r"[;&`$]|(?<![-\w/])>(?!\s)|\brm\b|\bmv\b|\bsudo\b|" | ||
| r"(?:^|\s)curl\s+(?:https?://|ftp://)|\bwget\b|\bchmod\b|\bchown\b", | ||
| re.IGNORECASE, | ||
| ) |
There was a problem hiding this comment.
5. Shell redirect bypass 🐞 Bug ⛨ Security
_UNSAFE_SHELL fails to block common redirection forms like > file and 2>file, but run_prose_shell_recovery() still executes the command with shell=True. This allows LLM-provided “read-only” prose shell fences to perform arbitrary file writes (including outside the workspace) when /agent prose-shell recovery runs.
Agent Prompt
## Issue description
The prose-shell recovery path executes assistant-provided shell strings with `shell=True`, but the `_UNSAFE_SHELL` regex does not block typical redirections like `> file` (with whitespace) or `2>file`, enabling file writes despite the intended “read-only” allowlist.
## Issue Context
- `/agent` may output fenced ```bash blocks.
- `Session` calls `run_prose_shell_recovery()` on those extracted commands.
- The allowlist currently permits pipes and fails to reliably reject redirection tokens.
## Fix Focus Areas
- Tighten the safety filter to reject any redirection operators (`>`, `>>`, `<`, `2>`, `&>`, etc.) and other shell metacharacters, OR
- Avoid `shell=True` entirely by parsing argv with `shlex.split()` and executing with `shell=False` (and either remove `|` support or implement safe pipelining without a shell).
- bright_vision_core/agent_turn.py[58-74]
- bright_vision_core/agent_turn.py[290-322]
- bright_vision_core/session.py[1016-1042]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
Add Ollama/vLLM/llama.cpp routing (Rust dispatcher, Python registry, UI capabilities), mocked pytest gate, e2e local-llm-backend spec, and a dedicated Test Lab step so backend tests are visible instead of buried in dogfood. Co-authored-by: Cursor <cursoragent@cursor.com>
Move pool/classify/apply-route into cecli (65 unit tests, yarn verify:cecli-hopper), thin bright_vision_core re-exports with preload hooks, Test Lab verify:cecli-hopper step, and dev-integration cecli pin. Co-authored-by: Cursor <cursoragent@cursor.com>
… #577 Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
dev-integration cecli lost cecli.spec.progress after hopper cherry-picks; restore from pr/spec-development. Pre-commit script checks out PR branch. Co-authored-by: Cursor <cursoragent@cursor.com>
Bump cecli to 3769282b6 (cherry-picks for upstream #574 and #577). Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
All LM Studio load paths use load_with_options directly (persistent or TTL for router hopper); the thin load helper was dead code. Co-authored-by: Cursor <cursoragent@cursor.com>
55d7174 to
352ff4d
Compare
Cecli #574 spec implement (checklist paths + resume tasks), #579 ContextManager add→create, #580 reject_yield hook. Session wires yield guard and code-tier routing for implement/agent; auto-advance requires EditText. Forward LITELLM_EXTRA_PARAMS from local-llm.env; hopper prefs on Vision start; resume keeps todo inject id. Co-authored-by: Cursor <cursoragent@cursor.com>
…ix async_bridge for graceful cancellation Co-authored-by: cecli (openai/qwen/qwen3.6-35b-a3b)
cecli @ 5b3572ad8 (cherry-pick of pr/spec-development 06c484b14 for #574). Co-authored-by: Cursor <cursoragent@cursor.com>
18611f9 to
42d7394
Compare
Summary
SteeringFilesHint— Active/Missing status, Open STEERING.md, Create template (desktop editor).pr/spec-development@efe29d0ed):scan_steering_files/scaffold_steering_files+bright-vision-tasks steering scan|scaffold.GET/POST …/workspaces/steering-files(+ scaffold); vision-client helpers;tests/core/test_http_steering_files.py.yarn verify:ears+yarn test:bright-core;yarn test:labinyarn dogfood:check; mocked e2ee2e/tasks-steering.spec.ts.Latest commit:
f3972c6. This branch also carries the E7 cecli spec lift (d7b49fa/ #55) and related dogfood/e2e fixes already ondev-integration.Test plan
yarn verify:ears(155 passed)yarn test:lab(36 passed)yarn test:fast(351 passed)yarn test:e2e e2e/tasks-steering.spec.ts(2 passed)yarn verify:cecli-spec(143 passed)yarn test:everythingor Test Lab full suite (optional; needs Ollama)Made with Cursor