Draft: docs(training-hub): QLoRA + CPT guides + e2e (8.2.3) by typhoonzero · Pull Request #270 · alauda/aml-docs

typhoonzero · 2026-06-24T01:22:44Z

Daily-loop dev (2026-06-22). Adds runnable QLoRA + CPT tutorials, training-hub-fine-tuning.mdx sections, e2e cases c13/c14. GPU smoke SKIPped (A30 saturated). See .docs/loop/worklog-2026-06-22.md. Draft.

- Narrow scope to Claude Code only; remove opencode and Codex CLI sections - Add how to configure reasoning effort when starting the InferenceService (server-side --reasoning-effort flag and request-time override) - Update Claude Code section with corrected proxy setup for LiteLLM and claude-code-router (config-driven, ccr code startup command) - Qwen3.6 and Gemma 4 recommendations and Unsloth quantized model list already present; no change needed

The flag does not exist in vLLM. Replaced with accurate guidance about server-wide control via --chat-template and request-level parameters.

…/coding-agents-inference-service

- Remove list preceding code block to avoid remark-lint-code-block-split-list - Replace Python dict literals with dict() constructor to avoid JSX parsing

…/pipelines-mlflow-integration

The pipelines-mlflow-integration example did not run as written. Fixes verified against MLflow + KFP on g1-c1-x86: - Import mlflow inside each @dsl.component (KFP v2 packages components from their own source; a module-level import raises NameError at runtime). - Replace dsl.RUN_ID_PLACEHOLDER (removed in KFP v2) with dsl.PIPELINE_JOB_ID_PLACEHOLDER, passed in as a component argument. - Document the secured-install access path: the mlflow-tracking-server Service fronts oauth2-proxy (302s headless clients), so components need a direct in-cluster Service, a ServiceAccount bearer token (MLFLOW_TRACKING_TOKEN), workspace RBAC, and a warm-up retry. - Fix the Trainer v2 example (trainer.kubeflow.org/v1alpha1 TrainJob with runtimeRef/trainer, not TrainingJob/v1 with a raw pod template). - Fix client.get_run_id -> run.run_id and the Tools menu path. Also: - Drop files unrelated to this PR's scope (agentic_mlops index + nav row, qwen3 finetune notebook) carried in from the coding-agents base branch. - Remove dead _retry_kubectl_stdin_novalidate() from e2e/lib.sh. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ethod Cross-checked against mlflow-plugin/mlflow-kubernetes-plugins: - Name the canonical mechanism: the server's `kubernetes-auth` plugin authorizes via Kubernetes RBAC and accepts a ServiceAccount bearer token (Authorization / X-Forwarded-Access-Token) + X-MLFLOW-WORKSPACE. - Fix caller RBAC resources to the plugin's API group set (experiments / datasets / registeredmodels); `runs` is not a resource (run writes authorize against `experiments`). - Add the canonical out-of-cluster token path (`kubectl create token`) alongside the in-pod projected token. - Document workspace selection via set_workspace() / MLFLOW_WORKSPACE. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Per mlflow-plugin/mlflow-kubernetes-plugins/docs/authorization-plugin.md: - Lead with the identity-token method: the server's `kubernetes-auth` plugin (user_identity_token mode) authenticates the caller from the bearer token's identity claims, authorizes that identity, and records it as the MLflow run owner. The client authenticates with the token before any API call. - Note the credential is a Kubernetes ServiceAccount token (the platform-wide `kubectl create token` pattern; sub claim is the identity). - Add a security warning: because user_identity_token reads claims unverified (the oauth2-proxy is the verifier), a direct endpoint must be network-restricted / not exposed via ingress, or run the server in self_subject_access_review mode. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…e test Reworks the KFP + MLflow guide to authenticate with a platform user identity token only — no ServiceAccount, no per-workspace RBAC, no extra in-cluster Service: - The MLflow kubernetes-auth plugin (user_identity_token mode) takes the caller identity from the bearer token's claims and records it as the run owner. - Components reach MLflow through the platform Kubernetes API (…/kubernetes/<cluster>/…/pods/<pod>:5000/proxy/…) and forward identity via X-Forwarded-Access-Token; the shipped Service only exposes the browser OAuth proxy, so this avoids it without creating anything. - Removed the direct-Service, ServiceAccount-token, and RBAC sections. - KFP example now uses a stdlib REST helper (no mlflow SDK install needed) and passes the token as a parameter (source from a Secret). Adds e2e/mlflow-user-identity-smoke.sh: logs a run with a user token and asserts the run owner equals the token identity. Verified on g1-c1-x86 (run owner admin@cpaas.io); the pipeline example compiles with kfp 2.11.0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

New how_to/mlflow-python-sdk.mdx: how to drive the stock mlflow>=3.10 SDK against the auth + multi-tenant Alauda AI MLflow server with a platform user identity token — no ServiceAccount, no per-workspace RBAC, no extra Service. Covers MLFLOW_TRACKING_TOKEN auth, mlflow.set_workspace, the port-forward connection to the app port (raw tunnel preserves Authorization), model registry, the smoke test, and troubleshooting (302 / token-newline / 401 / 403). Verified on g1-c1-x86: runs are owned by the token identity. Cross-linked from mlflow.mdx Client Configuration. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…cess) Rework mlflow-python-sdk.mdx so the MLflow Python client always goes through the oauth2-proxy (the platform MLflow route) instead of port-forwarding to the container port: - Interactive: present the browser SSO session — copy the _oauth2_proxy cookie and attach it via a runtime-registered RequestHeaderProvider (verified: the provider injects the header and the run is owned by the caller identity). - Headless/automation: admin enables oauth2-proxy --skip-jwt-bearer-tokens, then the client uses MLFLOW_TRACKING_TOKEN with a platform OIDC token. Removes the kubectl port-forward / app-port connection entirely. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

- SDK guide "Headless / automation": mint a short-lived Dex id token from a long-lived refresh token (refresh-token grant at /dex/token), then use it as MLFLOW_TRACKING_TOKEN through the OAuth proxy. Refresh before the 24h id-token expiry instead of carrying a static token. - Rework the smoke test to the same method: refresh token -> id token -> log to MLflow via the platform route (through oauth2-proxy, no container-port access), asserting the run owner equals the token identity. Requires the proxy's --skip-jwt-bearer-tokens. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

- SDK guide "Headless / automation": mint a Dex id token with the OAuth2 password grant (grant_type=password at /dex/token) — one call, no browser/ cookie — then use it as MLFLOW_TRACKING_TOKEN through the OAuth proxy. Requires a Dex client whose grantTypes include "password" + the proxy's --skip-jwt-bearer-tokens. Warns to use a dedicated service account (ROPC sends the password) and store creds in a Secret. - Rework the smoke test to ROPC: username/password -> Dex id token -> log to MLflow via the platform route (through oauth2-proxy), asserting run owner == token identity. Verified ROPC mints a valid Dex id token (iss=dex, aud=alauda-auth, key in Dex JWKS) on g1-c1-x86. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

mlflow-python-sdk.mdx now leads with the OAuth2 password grant: mint a Dex id token from a username/password at /dex/token, then use it as MLFLOW_TRACKING_TOKEN through the OAuth proxy. Adds an admin "Platform setup" section (--skip-jwt-bearer-tokens + a password-grant Dex client). The browser session-cookie flow is kept as a secondary "interactive alternative". Verified end-to-end on g1-c1-x86 (run owner = the token's user identity). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

- SDK guide: set_tracking_uri now uses the in-cluster Service http://mlflow-tracking-server.kubeflow:5000 (still via the OAuth proxy) for in-cluster clients; note the platform route for outside-the-cluster use. - Pipelines guide: rewritten to use the MLflow Python client against the in-cluster Service with MLFLOW_TRACKING_TOKEN injected from a Secret (kfp-kubernetes use_secret_as_env), and reference the SDK guide for auth/RBAC and minting the token (password grant). Drops the raw-REST/container-port helper. Trainer v2 example points MLFLOW_TRACKING_URI at the in-cluster Service. Example compiles with kfp 2.11 + kfp-kubernetes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The MLflow usage docs under training_guides now point to how_to/mlflow-python-sdk.mdx for authentication (MLFLOW_TRACKING_TOKEN) and workspace/RBAC on secured installs, where the bare MLFLOW_TRACKING_URI / report_to: mlflow setup is not sufficient: - fine-tuning-using-notebooks.mdx (Experiment tracking sections) - fine-tune-with-trainer-v2.ipynb (Step 5: View Training Metrics in MLflow) Also corrects the menu path to Alauda AI -> Tools -> MLFlow. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…roxy's own Dex client A dedicated Dex client cannot be used for the password grant on this platform: the OAuth proxy validates that the token audience equals its own client_id, so a separate client's token is rejected at the proxy. Document enabling `password` in the grantTypes of the proxy's own OAuth2Client (verified against the live cluster), with the kubectl patch, the aud constraint, and a security caveat. Update the matching troubleshooting row. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…ookie) ROPC needs the password grant on the shared alauda-auth client, i.e. a change to the global auth server — which is off-limits. The platform already allows the authorization_code grant, and its login API is scriptable (PKCE; captcha is retry-gated, so a clean first login needs none). Rewrite the SDK guide around two browser-free methods, both verified end-to-end on g1-c1-x86: - Bearer token (primary): scripted authorization_code+PKCE -> id_token as MLFLOW_TRACKING_TOKEN, renewed via the refresh_token grant. Needs --skip-jwt-bearer-tokens on the MLflow proxy (workload cluster, not global auth). Python helper + curl; both verified. - Session cookie (fallback): same scripted login fed to the proxy callback -> _oauth2_proxy cookie. Zero platform changes. Point pipelines-mlflow-integration at the SDK guide's token flow instead of the password grant (and fix the renamed platform-setup anchor). Rewrite the e2e smoke test to exercise both legs (token leg SKIPs cleanly when skip-jwt is off) and fix a cleanup bug where the _oauth2_proxy cookie value contains '|', which collided with the delimiter and leaked experiments. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…l -> 302 -> follows redirect to platform HTTPS) The MLflow SDK reports SSLCertVerificationError when the proxy rejects the credential: it 302s to the login page and the client follows it to the platform's self-signed HTTPS endpoint. Document the real cause (fix the credential, not the TLS) and note the in-cluster http:// Service URL plus MLFLOW_TRACKING_INSECURE_TLS for the external route in the cookie section. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Close the L0 8.2.3 gaps (QLoRA, continued pre-training) in the Training Hub corpus, matching the existing SFT/OSFT tutorial style. APIs verified against the real traininghub0.1-cu126-amd64 v0.1.0 runtime image on g1-c1-x86: - QLoRA -> training_hub.lora_sft(load_in_4bit=True, bnb_4bit_quant_type="nf4") - CPT -> training_hub.sft(is_pretraining=True, block_size, document_column_name) - training-hub-fine-tuning.mdx: algorithm table now covers SFT/OSFT/QLoRA/CPT; new "QLoRA (4-bit LoRA)" and "Continued pre-training (CPT)" sections (CUDA-first, with Ascend NPU notes); notebooks added to the examples table. - qlora-comprehensive-tutorial.ipynb / cpt-comprehensive-tutorial.ipynb: runnable comprehensive notebooks mirroring the SFT/OSFT tutorials. - e2e cases c13 (QLoRA) and c14 (CPT): self-contained (synthetic tiny Qwen2 + synthetic data, no model/corpus download). c13 drives lora_sft 4-bit QLoRA with a trl+peft+bitsandbytes fallback and an sm_75 arch guard; c14 drives sft(is_pretraining=True). Both SKIP (rc=77) with the captured scheduler event when no GPU slice is schedulable. - run_all.sh: wire C13 (active); C14 left commented (like C4) until GPU frees up. E2E smoke (g1-c1-x86, ns mlops-demo-e2e): c13 attempted for real -> SKIP(77): the only Ampere+ GPU (A30, sm_80) is 100% reserved by a persistent 27B inference pod (gpumem=24k/gpucores=100), and the only free GPU is a P100 (sm_60, no bitsandbytes 4-bit). Scheduler: FailedScheduling / CardInsufficientMemory. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-06-24T01:22:51Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ee0f99a1-215b-47af-b493-745a3b9dc473

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch loop/2026-06-22-traininghub-qlora-cpt

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

typhoonzero and others added 24 commits June 9, 2026 10:23

docs: fix coding agent inference guide

b0becaf

docs: remove non-existent --default-chat-template-kwargs flag

7871e1f

The flag does not exist in vLLM. Replaced with accurate guidance about server-wide control via --chat-template and request-level parameters.

docs: clarify vllm reasoning effort support

285e68d

docs: refine agentic mlops tuning guidance

b18b5cd

Merge branch 'master' of https://github.com/alauda/aml-docs into docs…

79d27c9

…/coding-agents-inference-service

docs: fix lint error in pipelines-mlflow-integration guide

3c79b62

- Remove list preceding code block to avoid remark-lint-code-block-split-list - Replace Python dict literals with dict() constructor to avoid JSX parsing

Merge branch 'master' of https://github.com/alauda/aml-docs into docs…

a6d351b

…/pipelines-mlflow-integration

update

ddff8e5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Draft: docs(training-hub): QLoRA + CPT guides + e2e (8.2.3)#270

Draft: docs(training-hub): QLoRA + CPT guides + e2e (8.2.3)#270
typhoonzero wants to merge 24 commits into
masterfrom
loop/2026-06-22-traininghub-qlora-cpt

typhoonzero commented Jun 24, 2026

Uh oh!

coderabbitai Bot commented Jun 24, 2026

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

typhoonzero commented Jun 24, 2026

Uh oh!

coderabbitai Bot commented Jun 24, 2026

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants