Run the Manager locally (make dev) via a NodeApi abstraction + DummyApi#353
Open
runleveldev wants to merge 18 commits into
Open
Run the Manager locally (make dev) via a NodeApi abstraction + DummyApi#353runleveldev wants to merge 18 commits into
runleveldev wants to merge 18 commits into
Conversation
Admin-defined default environment variables were merged into a container's DB record when the container was created. This froze the defaults to their creation-time values and exposed them in the edit UI. Now the container record stores only user-defined env vars. System defaults (from Settings) and NVIDIA GPU defaults are merged in at configure-time, just before the Proxmox API call, with user-defined values taking precedence. - models/container.js: buildLxcEnvConfig() now merges defaults < NVIDIA defaults < user vars; add getSystemDefaultEnvVars(), parseEnvironmentVars(), and nvidiaDefaultEnvVars() helpers. - bin/create-container.js: merge system defaults at configure-time and stop writing merged env/entrypoint back into the DB record. - bin/reconfigure-container.js: merge system defaults on edit so newly added defaults also reach existing containers. - routers/api/v1/containers.js: stop baking NVIDIA defaults into the saved record on create. Closes #349
Move all environment-variable merge logic into Container#buildLxcEnvConfig so there is exactly one place that determines the env to deploy. The method now loads the admin-defined system defaults itself (it became async) and composes them with the NVIDIA and user-defined values: system defaults < NVIDIA defaults < user-defined values Callers in create-container.js and reconfigure-container.js are reduced to a single `await container.buildLxcEnvConfig()` call and no longer load or pass defaults themselves. getSystemDefaultEnvVars, parseEnvironmentVars and nvidiaDefaultEnvVars are now internal helpers of that method.
… the Proxmox env string Addresses PR review: env var keys were only trimmed, so a key containing '=' or NUL (or a non-string/array value) could corrupt the NUL-separated KEY=value blob sent to Proxmox (splitting one var into many, breaking parsing, or stringifying objects to "[object Object]"). Add Container.normalizeEnvVars() as the single definition of a valid env var: input must be a plain object; keys are trimmed and must match a conventional env-var name (no '=', NUL, or whitespace); values are coerced to strings, with null/undefined, objects/arrays, and NUL-containing values dropped. Apply it at every ingest/load point: - getSystemDefaultEnvVars() and parseEnvironmentVars() normalize their output. - API create/update routes serialize user env vars via a shared serializeUserEnvVars() helper that normalizes before persisting. buildLxcEnvConfig can now trust its inputs, so the per-pair guards are dropped.
buildLxcEnvConfig always added env/entrypoint to Proxmox's `delete` list when the container record had no value. On create the container is freshly cloned from a template that may carry its own env/entrypoint, so emitting `delete` actively wiped those template-provided defaults. Add a deleteMissing option (default false). Proxmox's config PUT is a partial update, so omitting a key leaves it untouched: - create-container.js uses the default (false): empty env/entrypoint are omitted, preserving whatever the cloned template provides. - reconfigure-container.js passes deleteMissing:true so clearing the last env var or removing a custom entrypoint still unsets it on the existing container. The reconfigure restart-detection already ignores the `delete` key, so a pure-delete config does not, by itself, trigger a restart.
Reconfigure uses deleteMissing, so any env/entrypoint that exists only on the cloned template (and was never stored on the container) would be unset on the first reconfigure — surprising the user. We can't re-query the template later: templates are mutable Docker refs that may change or disappear, and we don't cache them. Compromise: at creation, after the template is cloned, read its LXC config and fold its env/entrypoint into the container record as if the user had supplied them (precedence: template < user). This guarantees those values persist across future reconfigures without changing API client expectations. System and NVIDIA defaults are deliberately NOT persisted — they remain configure-time-only (merged by buildLxcEnvConfig) so they aren't frozen into the record or exposed as user values in the edit UI. - models/container.js: add Container.parseLxcEnvString() and Container#persistTemplateDefaults(). - bin/create-container.js: call persistTemplateDefaults() with the cloned template's config before building/applying the LXC env config.
Resolves #350. Container status was a static column updated only by the create/reconfigure job scripts, so it drifted from reality whenever Proxmox changed out of band or a job died mid-run. This replaces it with status computed on demand. New utils/container-status.js resolves status from three sources: - Proxmox: does the LXC exist, and is it running/stopped? - Jobs: active create/reconfigure job, or did the last create job fail? - Config: does the live LXC config match what the API server expects? Resolved statuses: running, offline, creating, restarting, failed, missing, out-of-sync (plus unknown when Proxmox is unreachable). New endpoint: GET /api/v1/sites/:siteId/containers/:id/status -> { data: { status } } Non-breaking: the same live status is still embedded on the list/show/create payloads, so existing consumers keep working. The static Containers.status column is dropped (migration + model). All former writers were updated to stop persisting status; templates.js now selects provisioned containers by VMID + IPv4 instead of status='running'. The list page queries the per-container status endpoint (seeded by the list value for instant paint, polling while creating/restarting). OpenAPI and README updated.
Live status is already embedded in the existing list/show/create container payloads, so the dedicated GET /containers/:id/status endpoint is redundant. Per-row polling + hydration on the index page is not needed at this time, so revert the ContainerStatusBadge to a simple presentational badge that renders the live status from the list response. Removes the endpoint, the client query/key/result type, the OpenAPI path, and the README endpoint section (the status-value reference is retained as a field description). The ContainerStatus enum and the live resolver remain.
- out-of-sync is a warning, not a danger, badge variant - templates.js routes containers by ipv4Address only (a VMID isn't required to route)
- container-status.js: match --container-id as a whole argument when looking up
create/reconfigure jobs, so container 12 no longer matches container 123's
jobs (which could misclassify status). LIKE narrows at the DB level; a regex
confirms the id is terminated by a space or end-of-string.
- resource-requests.js: the applyResourceToExistingContainers query filtered on
the now-removed Containers.status column ("Unknown column 'status'" after the
migration). Use containerId != null (a VMID is required to reconfigure anyway).
- README: note the create endpoint also returns the live status field.
The calling convention for create/reconfigure job commands is stable and deterministic, so match the command string exactly in the query rather than fetching LIKE candidates and regex-filtering in JS: - create: exact match on `node bin/create-container.js --container-id=<id>` (create commands never carry trailing args). - reconfigure: exact match OR `<base> %` (exact base followed by a space), which covers the optional ` --<resource>=<value>` flags while still preventing id 12 from matching id 123.
Resolving the container list status was O(N) DB queries because every container ran its own create/reconfigure job lookup (up to 2 queries each). Add prefetchLatestJobs() which fetches the latest create + reconfigure job for the whole set of containers in a single query (bounded WHERE clauses, independent of N), buckets them per container, and feeds the result into computeContainerStatus via an optional `jobs` param. computeContainerStatuses now issues one job query total instead of ~2N. The single-container show path is unchanged (still queries per container when no prefetched jobs are passed).
…ng, out-of-sync)
Resolving the container list was still slow because each container did per-item
work: a command-string job lookup for 'restarting', and a Proxmox lxcConfig call
for 'out-of-sync'. Both are inherently O(N) (Proxmox has no bulk config endpoint),
which dominated index latency.
Drop both statuses:
- 'restarting': not used by known API clients (launchpad only checks == running),
and reconfigure jobs have no FK so detecting it required command-string queries.
- 'out-of-sync': required one lxcConfig call per in-Proxmox container; no bulk
Proxmox config API exists to batch it.
Status now derives only from:
- Proxmox run-state, via one clusterResources('lxc') snapshot per node, and
- the create job, resolved from the eager-loaded creationJob association (or the
creationJobId FK) — no command-string queries.
Result: the index makes one Proxmox call per node and zero per-container Proxmox
or DB job queries (verified at 30 containers). Remaining statuses: running,
offline, creating, failed, missing, unknown. PUT still enqueues a reconfigure job
and returns its jobId; only the 'restarting' status label is removed.
Updates the resolver, containers router (eager-load creationJob on the index),
client types/badges, OpenAPI enum, and README.
e27a110 to
9f7346a
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
This PR makes the Manager runnable standalone on localhost via make dev, adding a provider abstraction (Node.api() dispatching by nodeType) and a DummyApi so container creation can exercise the real job-runner path without needing Proxmox.
Changes:
- Add
nodeTypewithNode.api()dispatch (proxmoxvsdummy) and implementDummyApito simulate hypervisor calls. - Introduce
make dev+bin/dev-bootstrap.jsto bootstrap a local SQLite dev environment and run server + job-runner together. - Add a
patch-packagebackport patch for Sequelize SQLite composite-index uniqueness behavior and gate Sequelize SQL logging behindLOG_LEVEL=trace.
Reviewed changes
Copilot reviewed 14 out of 16 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| README.md | Updates docs pointer to highlight make dev workflow. |
| mie-opensource-landing/docs/developers/development-workflow.md | Adds a dedicated make dev section and reorganizes Docker-stack guidance. |
| Makefile | Adds dev target to install, migrate, bootstrap, build, and run server + job-runner. |
| create-a-container/utils/dummy-api.js | Introduces DummyApi implementing the NodeApi/ProxmoxApi method surface for local runs. |
| create-a-container/routers/api/v1/containers.js | Makes node selection provider-agnostic and routes delete teardown through node.api(). |
| create-a-container/patches/sequelize+6.37.8.patch | Adds a patch-package patch for Sequelize v6 SQLite query-interface uniqueness handling. |
| create-a-container/package.json | Adds postinstall patch-package hook and updates db migration script to sequelize-cli. |
| create-a-container/package-lock.json | Locks patch-package and its transitive dependencies. |
| create-a-container/models/node.js | Adds nodeType field and Node.api() dispatch to DummyApi vs ProxmoxApi. |
| create-a-container/migrations/20260605000000-add-node-type-to-nodes.js | Adds nodeType column with default proxmox. |
| create-a-container/example.env | Documents LOG_LEVEL behavior for SQL query logging. |
| create-a-container/config/config.js | Gates Sequelize SQL logging behind LOG_LEVEL=trace (including production). |
| create-a-container/bin/dev-bootstrap.js | Seeds an empty DB with a localhost site, dummy node, and localhost external domain. |
Files not reviewed (1)
- create-a-container/package-lock.json: Generated file
…figure' into 329-rebased
… + DummyApi
Run the Manager standalone on localhost (SQLite, `make dev`) and create
containers without Proxmox, keeping the real provisioning code path.
- NodeApi abstraction: Node.api() dispatches on a new `nodeType` column.
`proxmox` (default) returns ProxmoxApi; `dummy` returns a new DummyApi that
implements the same surface and simulates Proxmox calls. Creation still runs
POST /containers -> Job -> job-runner -> bin/create-container.js -> node.api().
- DummyApi.clusterResources('lxc') derives the running set from the DB so the
live container-status resolver (#352) reports dummy containers correctly;
dev-bootstrap gives the dummy node placeholder creds so nodeHasCreds() passes.
- Provider-agnostic node selection that only excludes unprovisionable nodes
(dummy, or proxmox with apiUrl/tokenId/secret); DELETE routes teardown through
node.api(). Node.api() validates apiUrl alongside credentials.
- Postgres-safe migration adding `nodeType` (existing rows default to proxmox).
- Fix the erroneous UNIQUE on Containers.nodeId via patch-package (v6 backport of
sequelize/sequelize#17583); replaces the SQLite-specific migration.
- bin/dev-bootstrap.js seeds a localhost site + dummy node + external domain,
no-op if any Site exists, so it never clobbers the docker-compose bootstrap.
- Make idempotent seeders (groups, push-notification) so re-running them in
`make dev` doesn't violate unique constraints.
- Gate Sequelize SQL query logging behind LOG_LEVEL=trace; off otherwise.
- Replace the unmaintained sqlite3 with @vscode/sqlite3 (dev-only, builds from
source via dialectModule) so there is no prebuilt-binary glibc mismatch; prod
(Postgres) never installs or loads it.
- `make dev` lives in create-a-container/Makefile (root delegates): self-contained
install (--no-package-lock, so it never rewrites the lockfile), migrate, seed,
then run server (nodemon) + job-runner + client watch together.
- Docs: "Run the Manager Locally (make dev)" section; README + release-pipeline
pointers; LOG_LEVEL documented in example.env.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Merge #351 and #352 first (this branch already contains them, rebased on current
mainwhich includes #347).Why
Supersedes #329. That PR made the Manager runnable on
localhostbut did so in ways flagged in review: a mock that bypassed the job-runner, in-band signaling (apiUrl === 'local'), a SQLite-specific migration, and dev-login provisioning that clobbered the Docker Compose bootstrap.This reworks the same goal — run the Manager standalone and create containers without Proxmox — but addresses the root causes and keeps the real provisioning code path intact. It also lays the groundwork for additional production-capable node providers behind a single
NodeApiabstraction (starting with the dummy provider).What changed
1.
NodeApiabstraction (ProxmoxApi+ newDummyApi)Node.api()now dispatches on a newnodeTypecolumn:proxmox(default) → authenticatedProxmoxApi(unchanged behavior).dummy→ newDummyApi(utils/dummy-api.js) implementing the same method surface, simulating the Proxmox calls.POST /containers→Job→job-runner.js→bin/create-container.js→node.api(). The full creation path is exercised; only the hypervisor calls are faked. Running the job-runner is required (andmake devdoes so).DummyApi.clusterResources('lxc')derives the running set from the DB so the live container-status resolver (Container status API: compute live status instead of static DB column #352) reports dummy containers asrunning; the dev bootstrap gives the dummy node placeholder credentials so the resolver routes throughnode.api().2. Provider-agnostic-ish node selection
Node selection only excludes nodes that can't provision (it picks a dummy node, or a node with full
apiUrl/tokenId/secret), so a half-configured node is never selected and then failed innode.api(). The DELETE path likewise routes VM teardown throughnode.api().Node.api()validatesapiUrlalongside the credentials.3.
nodeTypecolumn (Postgres-safe)New migration
20260605000000-add-node-type-to-nodesaddsnodeType(STRING(50), default'proxmox'). Existing rows backfill toproxmoxvia the column default;Node.bulkCreate(node import) also applies it, so production node selection is unchanged.4. Root-cause fix for the erroneous
UNIQUEonContainers.nodeId(via patch-package)A v6 backport of sequelize/sequelize#17583 (
patches/sequelize+6.37.8.patch) so SQLite'schangeColumn/describeTableno longer marks members of a composite index asunique. Wired viapatch-package(postinstall). Replaces the SQLite-specific migration.5. Local dev environment out of
/auth/devNew
bin/dev-bootstrap.jsseeds alocalhostsite, adummynode, and alocalhostexternal domain. It no-ops if any Site already exists, so it never collides with the Docker Compose bootstrap. The/auth/devlogin route is left clean. The groups + push-notification seeders are made idempotent so re-running them undermake devdoesn't violate unique constraints.6. SQL query logging gated by
LOG_LEVEL=traceSequelize query logging (noisy — the job-runner polls the
Jobstable every tick) is gated behindLOG_LEVEL=trace, in both dev and production; off otherwise. Pass it through:make dev LOG_LEVEL=trace.7.
@vscode/sqlite3instead of the unmaintainedsqlite3The dev SQLite driver is the actively-maintained
@vscode/sqlite3fork, wired in via Sequelize v6dialectModule. It builds from source (no prebuilt-binary glibc mismatches that brokemake devon LTS distros) and is a devDependency required lazily only when the sqlite dialect is selected — so production (Postgres) never installs or loads it (npm ci --omit=devskips the native build entirely).8.
make devLives in
create-a-container/Makefile(the rootMakefiledelegates). Self-contained: installs server + client deps withnpm install --no-package-lock(so it never rewrites the lockfile), runs migrations, seeds the localhost site + dummy node, builds the client, then runs server (nodemon) + job-runner + client watch together. No.envis needed — the server's defaults (SQLite atdata/database.sqlite, an auto-generated session secret, dev login when not in production) are sufficient.db:migrateis fixed to callsequelize-cli.9. Docs
New "Run the Manager Locally (
make dev)" section in the development-workflow doc; README + release-pipeline pointers updated;LOG_LEVELdocumented inexample.env.How this addresses the #329 review comments
NodeApiinterface +DummyApi, with the job-runner still in the loopDummyApiimplements theProxmoxApisurface; creation runs through the job-runner andbin/create-container.jsunchangedapiUrl === 'local'); want a dedicated fieldnodeTypecolumn drivesNode.api()/auth/devintobin/dev-bootstrap.js, which no-ops if any Site exists.gitignoreentrymake devuses the defaultdata/database.sqlite(already ignored bycreate-a-container/data/.gitignore)Also addressed the Copilot review on this PR (provisionable-node filter + clearer error,
Node.api()apiUrl validation, conditionalLOG_LEVELforwarding in the Makefile, larger non-reused DummyApi VMID space, real locally-administered MAC prefix, patch-package moved so--omit=devinstalls don't fail).Testing
remove-container-status);Containershas no column-levelUNIQUEonnodeId.bin/create-container.jsagainstDummyApilands the container, and the live status resolver reports itrunning(andcreating/failedfor the other states); DELETE routes throughnode.api()and removes the row.npm ci(dev) andnpm ci --omit=dev(production, no@vscode/sqlite3, no native build) both succeed.make devruns server + job-runner + client watch and does not modify the lockfile.Notes
main(which includes Debian packaging via per-component Makefiles + fpm #347); contains Apply default env vars at configure-time rather than at save #351 + Container status API: compute live status instead of static DB column #352, with no Debian packaging via per-component Makefiles + fpm #347 commit duplication.Closes #329