[e2e] Add nightly e2e test for submitting examples to flink standalone cluster#708
[e2e] Add nightly e2e test for submitting examples to flink standalone cluster#708matrixsparse wants to merge 1 commit into
Conversation
|
Hi @wenjin272, this PR implements the CI pipeline for #642 as discussed. Could you PTAL when you have time? |
8189bc8 to
704e45c
Compare
| on: | ||
| schedule: | ||
| - cron: '0 0 * * *' | ||
| workflow_dispatch: |
There was a problem hiding this comment.
Nightly + manual dispatch means a regression in examples/**, python/flink_agents/examples/**, or tools/install.sh can sit undetected for up to 24h. Would a path-filtered pull_request: trigger for those paths make sense here, with the cron staying as the safety net for transitive-dep changes? The Flink download + full build is non-trivial wall time per PR, so the nightly-only choice is defensible too — curious which trade-off you prefer.
There was a problem hiding this comment.
Agreed. added a path-filtered pull_request trigger for those paths. The cron stays as the safety net for transitive-dep changes. The path filter is narrow enough that most PRs won't trigger it, so wall-time cost is acceptable.
| failed=$((failed + 1)) | ||
| fi | ||
| done | ||
| printf "\nTotal: %d Passed: %d Failed: %d\n" "$total" "$passed" "$failed" |
There was a problem hiding this comment.
If install_flink, build_project, stage_dist_jars, or start_cluster dies under set -e, no result is ever recorded, so print_summary walks an empty RESULT_NAMES and prints Total: 0 Passed: 0 Failed: 0 before cleanup propagates the original non-zero exit code. The CI job still fails on the exit code, but a person scanning the log sees a "zero failures" summary right before the red X, which is misleading when triaging a 45-minute nightly run.
One way it could read, if useful:
if (( total == 0 )); then
log_error "Test setup failed before any example was submitted"
return
firight above the existing if (( failed > 0 )) check.
There was a problem hiding this comment.
Fixed exactly as you suggested.
xintongsong
left a comment
There was a problem hiding this comment.
Thanks for working on this, @matrixsparse . It's a good idea to test with the example jobs nightly.
I'm not sure about only validates the job submission success. I think currently all example jobs can run with local LLMs in Ollama. That shouldn't be a problem against verifying the full execution. Did I miss anything?
| log_ok "Staged: $(basename "$flink_jar")" | ||
| } | ||
|
|
||
| package_examples() { |
There was a problem hiding this comment.
I think build_project should have already built the examples. We should not need to re-build them.
There was a problem hiding this comment.
Good catch. Removed the redundant package_examples().
There was a problem hiding this comment.
You're right. I've updated the script to install Ollama, pull qwen3:8b, and wait for each job to reach FINISHED status instead of just verifying submission.
There was a problem hiding this comment.
In addition to verifying the jobs reaching the FINISH status, I think we can also check for the error logs to identify if the job is running properly. Flink's e2e test already have it and we may copy / reuse those approaches. See test-scripts/common.sh.
There was a problem hiding this comment.
Thanks for the suggestion! I've added a check_logs_for_errors() function that scans TaskManager/JobManager logs for exceptions after job completion, inspired by Flink's test-scripts/common.sh.
| log_section "Step 6: submit Java examples" | ||
| submit_java_example "org.apache.flink.agents.examples.ReActAgentExample" | ||
| submit_java_example "org.apache.flink.agents.examples.WorkflowSingleAgentExample" | ||
| submit_java_example "org.apache.flink.agents.examples.WorkflowMultipleAgentExample" | ||
|
|
||
| log_section "Step 7: submit Python examples" | ||
| submit_python_example "$ROOT_DIR/python/flink_agents/examples/quickstart/react_agent_example.py" | ||
| submit_python_example "$ROOT_DIR/python/flink_agents/examples/quickstart/workflow_single_agent_example.py" | ||
| submit_python_example "$ROOT_DIR/python/flink_agents/examples/quickstart/workflow_multiple_agent_example.py" |
There was a problem hiding this comment.
Is it intended to not cover all the example jobs?
There was a problem hiding this comment.
Yes, intentional. All 6 quickstart examples (3 Java + 3 Python) are covered. The RAG examples (python/flink_agents/examples/rag/) are excluded because they require a vector store and an embedding model that aren't provisioned in this CI setup. Added a comment in the script explaining this. We can add RAG coverage in a follow-up once vector store infrastructure is available in CI.
There was a problem hiding this comment.
I think there are 5 examples in Java, and 6 examples in Python (5 in quickstart/ and 1 in rag/). And there could be more in future. Is is possible to iterate over the example directory and submit everything it finds? (Might need to reorganize the example directory to follow certain pattern.)
As for the rag example, it uses a local ollama embedding model and a local chroma vector store, so there should be no problem running it locally in CI.
There was a problem hiding this comment.
Done. Examples are now auto-discovered (*Example.class / *_example.py), including RAG. CI uses qwen3:0.6b aliased to expected model names — configurable via OLLAMA_CHAT_MODEL env var.
b1de619 to
8e705b9
Compare
|
The nightly e2e tests threw the following exception: In 0.3 we renamed the I think we should replace the flink-agents in the cluster with the jar built from the current codebase. Concretely, we can first use install.sh to install Flink and flink-agents, then build flink-agents from the current branch and replace the flink-agents jar in FLINK_HOME/lib. Since this issue is primarily aimed at testing the submission of the quickstart example, and version 0.3 has already entered code freeze, I am changing the fixVersion of this issue from 0.3.0 to 0.4.0. In version 0.4, we will focus on refining the maturity and stability of existing capabilities. |
f18b52d to
e1182cd
Compare
e1182cd to
75ac60d
Compare
|
@wenjin272 Good catch, thanks for the analysis! Fixed — the script now removes any pre-existing flink-agents-dist-*.jar from FLINK_HOME/lib/ before copying the freshly built one, ensuring the cluster always uses the jar from the current branch. Also adjusted the test to verify successful submission (job ID obtained) as the primary pass criterion, given the lightweight CI model limitations. |
Purpose of change
Add automated e2e test for submitting Java/Python quickstart examples to a Flink standalone cluster, replacing the current manual verification process before each release.
Closes #642
Changes
e2e-test/test-scripts/test_submit_examples_to_flink.sh: Test script that installs Flink viainstall.sh, starts a standalone cluster, submits all 6 examples (3 Java + 3 Python), verifies submission success, and cleans up..github/workflows/nightly-e2e.yml: Nightly GitHub Actions workflow that runs the test daily at UTC 00:00, with manual trigger support.Key design decisions
tools/install.sh --non-interactive(from [tools]Import Wizard for Installation Setup #599) for Flink installation