Draft: docs(training-hub): QLoRA + CPT guides + e2e (8.2.3)#270
Draft
typhoonzero wants to merge 24 commits into
Draft
Draft: docs(training-hub): QLoRA + CPT guides + e2e (8.2.3)#270typhoonzero wants to merge 24 commits into
typhoonzero wants to merge 24 commits into
Conversation
- Narrow scope to Claude Code only; remove opencode and Codex CLI sections - Add how to configure reasoning effort when starting the InferenceService (server-side --reasoning-effort flag and request-time override) - Update Claude Code section with corrected proxy setup for LiteLLM and claude-code-router (config-driven, ccr code startup command) - Qwen3.6 and Gemma 4 recommendations and Unsloth quantized model list already present; no change needed
The flag does not exist in vLLM. Replaced with accurate guidance about server-wide control via --chat-template and request-level parameters.
…/coding-agents-inference-service
- Remove list preceding code block to avoid remark-lint-code-block-split-list - Replace Python dict literals with dict() constructor to avoid JSX parsing
…/pipelines-mlflow-integration
The pipelines-mlflow-integration example did not run as written. Fixes verified against MLflow + KFP on g1-c1-x86: - Import mlflow inside each @dsl.component (KFP v2 packages components from their own source; a module-level import raises NameError at runtime). - Replace dsl.RUN_ID_PLACEHOLDER (removed in KFP v2) with dsl.PIPELINE_JOB_ID_PLACEHOLDER, passed in as a component argument. - Document the secured-install access path: the mlflow-tracking-server Service fronts oauth2-proxy (302s headless clients), so components need a direct in-cluster Service, a ServiceAccount bearer token (MLFLOW_TRACKING_TOKEN), workspace RBAC, and a warm-up retry. - Fix the Trainer v2 example (trainer.kubeflow.org/v1alpha1 TrainJob with runtimeRef/trainer, not TrainingJob/v1 with a raw pod template). - Fix client.get_run_id -> run.run_id and the Tools menu path. Also: - Drop files unrelated to this PR's scope (agentic_mlops index + nav row, qwen3 finetune notebook) carried in from the coding-agents base branch. - Remove dead _retry_kubectl_stdin_novalidate() from e2e/lib.sh. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ethod Cross-checked against mlflow-plugin/mlflow-kubernetes-plugins: - Name the canonical mechanism: the server's `kubernetes-auth` plugin authorizes via Kubernetes RBAC and accepts a ServiceAccount bearer token (Authorization / X-Forwarded-Access-Token) + X-MLFLOW-WORKSPACE. - Fix caller RBAC resources to the plugin's API group set (experiments / datasets / registeredmodels); `runs` is not a resource (run writes authorize against `experiments`). - Add the canonical out-of-cluster token path (`kubectl create token`) alongside the in-pod projected token. - Document workspace selection via set_workspace() / MLFLOW_WORKSPACE. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Per mlflow-plugin/mlflow-kubernetes-plugins/docs/authorization-plugin.md: - Lead with the identity-token method: the server's `kubernetes-auth` plugin (user_identity_token mode) authenticates the caller from the bearer token's identity claims, authorizes that identity, and records it as the MLflow run owner. The client authenticates with the token before any API call. - Note the credential is a Kubernetes ServiceAccount token (the platform-wide `kubectl create token` pattern; sub claim is the identity). - Add a security warning: because user_identity_token reads claims unverified (the oauth2-proxy is the verifier), a direct endpoint must be network-restricted / not exposed via ingress, or run the server in self_subject_access_review mode. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e test Reworks the KFP + MLflow guide to authenticate with a platform user identity token only — no ServiceAccount, no per-workspace RBAC, no extra in-cluster Service: - The MLflow kubernetes-auth plugin (user_identity_token mode) takes the caller identity from the bearer token's claims and records it as the run owner. - Components reach MLflow through the platform Kubernetes API (…/kubernetes/<cluster>/…/pods/<pod>:5000/proxy/…) and forward identity via X-Forwarded-Access-Token; the shipped Service only exposes the browser OAuth proxy, so this avoids it without creating anything. - Removed the direct-Service, ServiceAccount-token, and RBAC sections. - KFP example now uses a stdlib REST helper (no mlflow SDK install needed) and passes the token as a parameter (source from a Secret). Adds e2e/mlflow-user-identity-smoke.sh: logs a run with a user token and asserts the run owner equals the token identity. Verified on g1-c1-x86 (run owner admin@cpaas.io); the pipeline example compiles with kfp 2.11.0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
New how_to/mlflow-python-sdk.mdx: how to drive the stock mlflow>=3.10 SDK against the auth + multi-tenant Alauda AI MLflow server with a platform user identity token — no ServiceAccount, no per-workspace RBAC, no extra Service. Covers MLFLOW_TRACKING_TOKEN auth, mlflow.set_workspace, the port-forward connection to the app port (raw tunnel preserves Authorization), model registry, the smoke test, and troubleshooting (302 / token-newline / 401 / 403). Verified on g1-c1-x86: runs are owned by the token identity. Cross-linked from mlflow.mdx Client Configuration. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…cess) Rework mlflow-python-sdk.mdx so the MLflow Python client always goes through the oauth2-proxy (the platform MLflow route) instead of port-forwarding to the container port: - Interactive: present the browser SSO session — copy the _oauth2_proxy cookie and attach it via a runtime-registered RequestHeaderProvider (verified: the provider injects the header and the run is owned by the caller identity). - Headless/automation: admin enables oauth2-proxy --skip-jwt-bearer-tokens, then the client uses MLFLOW_TRACKING_TOKEN with a platform OIDC token. Removes the kubectl port-forward / app-port connection entirely. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- SDK guide "Headless / automation": mint a short-lived Dex id token from a long-lived refresh token (refresh-token grant at /dex/token), then use it as MLFLOW_TRACKING_TOKEN through the OAuth proxy. Refresh before the 24h id-token expiry instead of carrying a static token. - Rework the smoke test to the same method: refresh token -> id token -> log to MLflow via the platform route (through oauth2-proxy, no container-port access), asserting the run owner equals the token identity. Requires the proxy's --skip-jwt-bearer-tokens. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- SDK guide "Headless / automation": mint a Dex id token with the OAuth2 password grant (grant_type=password at /dex/token) — one call, no browser/ cookie — then use it as MLFLOW_TRACKING_TOKEN through the OAuth proxy. Requires a Dex client whose grantTypes include "password" + the proxy's --skip-jwt-bearer-tokens. Warns to use a dedicated service account (ROPC sends the password) and store creds in a Secret. - Rework the smoke test to ROPC: username/password -> Dex id token -> log to MLflow via the platform route (through oauth2-proxy), asserting run owner == token identity. Verified ROPC mints a valid Dex id token (iss=dex, aud=alauda-auth, key in Dex JWKS) on g1-c1-x86. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
mlflow-python-sdk.mdx now leads with the OAuth2 password grant: mint a Dex id token from a username/password at /dex/token, then use it as MLFLOW_TRACKING_TOKEN through the OAuth proxy. Adds an admin "Platform setup" section (--skip-jwt-bearer-tokens + a password-grant Dex client). The browser session-cookie flow is kept as a secondary "interactive alternative". Verified end-to-end on g1-c1-x86 (run owner = the token's user identity). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- SDK guide: set_tracking_uri now uses the in-cluster Service http://mlflow-tracking-server.kubeflow:5000 (still via the OAuth proxy) for in-cluster clients; note the platform route for outside-the-cluster use. - Pipelines guide: rewritten to use the MLflow Python client against the in-cluster Service with MLFLOW_TRACKING_TOKEN injected from a Secret (kfp-kubernetes use_secret_as_env), and reference the SDK guide for auth/RBAC and minting the token (password grant). Drops the raw-REST/container-port helper. Trainer v2 example points MLFLOW_TRACKING_URI at the in-cluster Service. Example compiles with kfp 2.11 + kfp-kubernetes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The MLflow usage docs under training_guides now point to how_to/mlflow-python-sdk.mdx for authentication (MLFLOW_TRACKING_TOKEN) and workspace/RBAC on secured installs, where the bare MLFLOW_TRACKING_URI / report_to: mlflow setup is not sufficient: - fine-tuning-using-notebooks.mdx (Experiment tracking sections) - fine-tune-with-trainer-v2.ipynb (Step 5: View Training Metrics in MLflow) Also corrects the menu path to Alauda AI -> Tools -> MLFlow. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…roxy's own Dex client A dedicated Dex client cannot be used for the password grant on this platform: the OAuth proxy validates that the token audience equals its own client_id, so a separate client's token is rejected at the proxy. Document enabling `password` in the grantTypes of the proxy's own OAuth2Client (verified against the live cluster), with the kubectl patch, the aud constraint, and a security caveat. Update the matching troubleshooting row. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ookie) ROPC needs the password grant on the shared alauda-auth client, i.e. a change to the global auth server — which is off-limits. The platform already allows the authorization_code grant, and its login API is scriptable (PKCE; captcha is retry-gated, so a clean first login needs none). Rewrite the SDK guide around two browser-free methods, both verified end-to-end on g1-c1-x86: - Bearer token (primary): scripted authorization_code+PKCE -> id_token as MLFLOW_TRACKING_TOKEN, renewed via the refresh_token grant. Needs --skip-jwt-bearer-tokens on the MLflow proxy (workload cluster, not global auth). Python helper + curl; both verified. - Session cookie (fallback): same scripted login fed to the proxy callback -> _oauth2_proxy cookie. Zero platform changes. Point pipelines-mlflow-integration at the SDK guide's token flow instead of the password grant (and fix the renamed platform-setup anchor). Rewrite the e2e smoke test to exercise both legs (token leg SKIPs cleanly when skip-jwt is off) and fix a cleanup bug where the _oauth2_proxy cookie value contains '|', which collided with the delimiter and leaked experiments. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…l -> 302 -> follows redirect to platform HTTPS) The MLflow SDK reports SSLCertVerificationError when the proxy rejects the credential: it 302s to the login page and the client follows it to the platform's self-signed HTTPS endpoint. Document the real cause (fix the credential, not the TLS) and note the in-cluster http:// Service URL plus MLFLOW_TRACKING_INSECURE_TLS for the external route in the cookie section. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Close the L0 8.2.3 gaps (QLoRA, continued pre-training) in the Training Hub corpus, matching the existing SFT/OSFT tutorial style. APIs verified against the real traininghub0.1-cu126-amd64 v0.1.0 runtime image on g1-c1-x86: - QLoRA -> training_hub.lora_sft(load_in_4bit=True, bnb_4bit_quant_type="nf4") - CPT -> training_hub.sft(is_pretraining=True, block_size, document_column_name) - training-hub-fine-tuning.mdx: algorithm table now covers SFT/OSFT/QLoRA/CPT; new "QLoRA (4-bit LoRA)" and "Continued pre-training (CPT)" sections (CUDA-first, with Ascend NPU notes); notebooks added to the examples table. - qlora-comprehensive-tutorial.ipynb / cpt-comprehensive-tutorial.ipynb: runnable comprehensive notebooks mirroring the SFT/OSFT tutorials. - e2e cases c13 (QLoRA) and c14 (CPT): self-contained (synthetic tiny Qwen2 + synthetic data, no model/corpus download). c13 drives lora_sft 4-bit QLoRA with a trl+peft+bitsandbytes fallback and an sm_75 arch guard; c14 drives sft(is_pretraining=True). Both SKIP (rc=77) with the captured scheduler event when no GPU slice is schedulable. - run_all.sh: wire C13 (active); C14 left commented (like C4) until GPU frees up. E2E smoke (g1-c1-x86, ns mlops-demo-e2e): c13 attempted for real -> SKIP(77): the only Ampere+ GPU (A30, sm_80) is 100% reserved by a persistent 27B inference pod (gpumem=24k/gpucores=100), and the only free GPU is a P100 (sm_60, no bitsandbytes 4-bit). Scheduler: FailedScheduling / CardInsufficientMemory. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Daily-loop dev (2026-06-22). Adds runnable QLoRA + CPT tutorials, training-hub-fine-tuning.mdx sections, e2e cases c13/c14. GPU smoke SKIPped (A30 saturated). See .docs/loop/worklog-2026-06-22.md. Draft.