feat(litellm): pass-through cache_control_injection_points for Anthropic prompt caching by lordzaharum · Pull Request #2405 · The-PR-Agent/pr-agent

lordzaharum · 2026-05-19T16:28:44Z

Summary

Adds a small config pass-through to expose LiteLLM SDK's cache_control_injection_points kwarg via .pr_agent.toml / configuration.toml, enabling Anthropic prompt caching for self-hosted PR-Agent setups.

[PR-TARGET-BYPASS] cross-repo upstream PR to qodo-ai/pr-agent (external project, main = integration branch).

Why

PR-Agent currently does not surface LiteLLM's prompt caching feature. For self-hosted setups with an Anthropic Claude backend, this means every review pays the full input-token cost even when the system prompt is static (typical 3-5K tokens from extra_instructions + persona).

LiteLLM SDK already supports cache_control_injection_points natively (see LiteLLM prompt caching docs), but PR-Agent does not expose it through its configuration layer.

Changes

pr_agent/algo/ai_handlers/litellm_ai_handler.py — read LITELLM.CACHE_CONTROL_INJECTION_POINTS from settings (mirroring the existing extra_headers handling immediately above), parse the JSON array, and pass it to the acompletion call via kwargs. ~12 lines.
pr_agent/settings/configuration.toml — add a commented-out default + usage example inside the existing [litellm] section. 3 lines.

Total diff: ~15 lines.

Usage

In .pr_agent.toml or configuration.toml:

[litellm]
cache_control_injection_points = '[{"location": "message", "role": "system"}]'

json is already imported in the handler file, so no new imports are needed.

Compatibility

Backwards compatible: empty / missing setting = no caching = current behavior.
LiteLLM >= 1.49.0 (already a transitive dependency of pr-agent).
Tested against Anthropic Claude Sonnet 4.6 backend in a self-hosted PR-Agent action.

Cost impact (real-world)

Production setup (self-hosted PR-Agent GitHub Action with Anthropic Claude Sonnet 4.6):

Before: ~24K input tokens per review, ~$0.10 / PR, paid in full on every iteration round.
After (expected): with caching enabled on the static system message (3-5K tokens), 30-50% reduction on iterative review rounds within the 5-minute Anthropic TTL window. Verified by cache_creation_input_tokens > 0 on first review and cache_read_input_tokens > 0 on subsequent rounds in the Anthropic Console.

Validation

AST parse of edited handler: OK.
No-config behavior: unchanged (setting absent => branch not taken).
Invalid JSON / non-list value: raises ValueError with a clear message, mirroring the existing LITELLM.EXTRA_HEADERS validation pattern.

…pic prompt caching Add config pass-through to expose LiteLLM SDK's cache_control_injection_points kwarg via .pr_agent.toml or configuration.toml. Enables Anthropic prompt caching for self-hosted PR-Agent setups: [litellm] cache_control_injection_points = '[{"location": "message", "role": "system"}]' LiteLLM SDK supports this kwarg natively per https://docs.litellm.ai/docs/tutorials/prompt_caching but PR-Agent did not surface it through configuration. With static system prompts of 3-5K tokens (typical extra_instructions), caching delivers 30-50% input-token cost reduction on iterative review rounds within the 5-minute Anthropic TTL window. Backwards compatible: empty/missing setting = current behavior (no caching).

qodo-free-for-open-source-projects · 2026-05-19T16:28:59Z

Review Summary by Qodo

(Agentic_describe updated until commit `f59c5c3`)

Add Anthropic prompt caching support via LiteLLM configuration pass-through

✨ Enhancement

Walkthroughs

Description

• Expose LiteLLM's cache_control_injection_points configuration for Anthropic prompt caching
• Enable 30-50% input-token cost reduction on iterative reviews via static system prompt caching
• Add JSON array configuration option in [litellm] section with validation
• Maintain backwards compatibility with existing behavior when setting is absent

Diagram

flowchart LR
  A["Configuration<br/>cache_control_injection_points"] -->|JSON parse & validate| B["LiteLLM Handler"]
  B -->|pass to kwargs| C["LiteLLM acompletion"]
  C -->|Anthropic API| D["Prompt Cache<br/>30-50% savings"]

File Changes

1. pr_agent/algo/ai_handlers/litellm_ai_handler.py ✨ Enhancement +12/-0

Add cache_control_injection_points pass-through to LiteLLM handler

• Read LITELLM.CACHE_CONTROL_INJECTION_POINTS from settings configuration
• Parse JSON array and validate it matches expected format (must be list)
• Pass parsed cache_control_injection_points to LiteLLM acompletion call via kwargs
• Include error handling with clear messages for invalid JSON or non-list values

pr_agent/algo/ai_handlers/litellm_ai_handler.py

2. pr_agent/settings/configuration.toml 📝 Documentation +3/-0

Document cache_control_injection_points configuration option

• Add commented-out cache_control_injection_points configuration option in [litellm] section
• Include usage example showing JSON array format for Anthropic prompt caching
• Add reference link to LiteLLM prompt caching documentation

pr_agent/settings/configuration.toml

qodo-free-for-open-source-projects · 2026-05-19T16:29:01Z

Code Review by Qodo

New Review Started

This review has been superseded by a new analysis

qodo-free-for-open-source-projects · 2026-05-19T16:30:11Z

Code Review by Qodo

🐞 Bugs (1) 📘 Rule violations (3)

1. cache_control_injection_points overwrites kwargs 📘 Rule violation ⛨ Security

Description

The new config pass-through sets kwargs["cache_control_injection_points"] without checking whether
that key is already present. This can silently override caller-supplied/provider-supplied values and
violates the requirement to guard against parameter collisions when merging request kwargs.

Code

pr_agent/algo/ai_handlers/litellm_ai_handler.py[R538-543]

+                if get_settings().get("LITELLM.CACHE_CONTROL_INJECTION_POINTS", None):
+                    try:
+                        cache_points = json.loads(get_settings().litellm.cache_control_injection_points)
+                        if not isinstance(cache_points, list):
+                            raise ValueError("LITELLM.CACHE_CONTROL_INJECTION_POINTS must be a JSON array")
+                        kwargs["cache_control_injection_points"] = cache_points

Evidence

Rule 17 requires explicit collision guards when merging request parameters. The added code writes
kwargs["cache_control_injection_points"] directly, with no check for an existing value, so an
earlier source of the same key would be overwritten silently.

pr_agent/algo/ai_handlers/litellm_ai_handler.py[538-544]
Best Practice: Learned patterns

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`kwargs["cache_control_injection_points"]` is written unconditionally once the setting exists, which can silently overwrite an already-populated value coming from upstream kwargs processing.

## Issue Context
PR Compliance requires explicit collision guards for merged request parameters so critical/behavior-changing kwargs cannot be overwritten without detection.

## Fix Focus Areas
- pr_agent/algo/ai_handlers/litellm_ai_handler.py[538-545]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

2. cache_points parsing misses TypeError 📘 Rule violation ☼ Reliability

Description

The new JSON parsing for cache_control_injection_points only handles json.JSONDecodeError, but
json.loads(...) can also raise TypeError when the setting is not a string/bytes (e.g., a TOML
value parsed into a native Python list). This violates the compliance expectation to
validate/normalize configuration at the boundary and surface targeted configuration errors instead
of leaking unexpected runtime exceptions that can abort a request.

Code

pr_agent/algo/ai_handlers/litellm_ai_handler.py[R539-545]

+                    try:
+                        cache_points = json.loads(get_settings().litellm.cache_control_injection_points)
+                        if not isinstance(cache_points, list):
+                            raise ValueError("LITELLM.CACHE_CONTROL_INJECTION_POINTS must be a JSON array")
+                        kwargs["cache_control_injection_points"] = cache_points
+                    except json.JSONDecodeError as e:
+                        raise ValueError(f"LITELLM.CACHE_CONTROL_INJECTION_POINTS contains invalid JSON: {str(e)}")

Evidence

The cited code path in chat_completion retrieves
get_settings().litellm.cache_control_injection_points and unconditionally applies
json.loads(...) while only catching json.JSONDecodeError, meaning non-string inputs can trigger
an unhandled TypeError. Since the settings loader parses TOML using tomllib.load, TOML
arrays/tables are naturally produced as native Python types (including lists/dicts), so a valid TOML
representation of this setting can reach this boundary as a list and cause the request to crash
instead of being validated/normalized and converted into a clear configuration error as required by
Rule 18.

pr_agent/algo/ai_handlers/litellm_ai_handler.py[539-545]
pr_agent/custom_merge_loader.py[70-72]
pr_agent/algo/ai_handlers/litellm_ai_handler.py[535-545]
pr_agent/settings/configuration.toml[321-331]
Best Practice: Learned patterns

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`cache_control_injection_points` is currently passed through `json.loads(...)` with handling only for `json.JSONDecodeError`, but the value can legitimately arrive from configuration as a native Python type (e.g., a list from TOML parsing), in which case `json.loads` raises `TypeError` that is unhandled and can abort the request. Update the boundary handling to validate/normalize the type and raise a targeted configuration error (or accept the already-parsed type) rather than leaking unexpected exceptions.

## Issue Context
This setting is configured via TOML (e.g., `configuration.toml` / `.pr_agent.toml`) and the settings loader uses `tomllib.load`, which yields native Python types such as lists and dicts for TOML arrays/tables. The current implementation always attempts JSON parsing regardless of the incoming type and only catches JSON decode failures, so mis-typed or naturally-typed TOML values can cause an unhandled `TypeError` instead of a clear, configuration-focused error message.

## Fix Focus Areas
- pr_agent/algo/ai_handlers/litellm_ai_handler.py[535-546]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

3. configuration.toml adds commented config 📘 Rule violation ⚙ Maintainability

Description

The PR adds new commented-out configuration lines documenting cache_control_injection_points. This
violates the requirement to avoid commented-out/dead code in submitted changes.

Code

pr_agent/settings/configuration.toml[R329-331]

+# cache_control_injection_points = "" # Optional: JSON array enabling Anthropic prompt caching via LiteLLM
+# Example: cache_control_injection_points = '[{"location": "message", "role": "system"}]'
+# See https://docs.litellm.ai/docs/tutorials/prompt_caching

Evidence
Rule 2 forbids introducing commented-out code blocks. The PR adds three new commented lines
describing and exemplifying cache_control_injection_points in configuration.toml.
Rule 2: No Dead or Commented-Out Code
pr_agent/settings/configuration.toml[329-331]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
New commented-out configuration/example lines were added to `pr_agent/settings/configuration.toml`, which conflicts with the rule forbidding commented-out/dead code in PRs.

## Issue Context
If the intent is to document usage, prefer adding/expanding dedicated docs (or another approved documentation location) rather than adding commented-out config lines.

## Fix Focus Areas
- pr_agent/settings/configuration.toml[329-331]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

4. Config errors retried 🐞 Bug ☼ Reliability

Description

Invalid cache_control_injection_points JSON raises ValueError, but chat_completion's catch-all wraps
any Exception as openai.APIError; because chat_completion is decorated to retry openai.APIError,
this deterministic config error will be retried multiple times and surfaced as an API failure.

Code

pr_agent/algo/ai_handlers/litellm_ai_handler.py[R540-545]

+                        cache_points = json.loads(get_settings().litellm.cache_control_injection_points)
+                        if not isinstance(cache_points, list):
+                            raise ValueError("LITELLM.CACHE_CONTROL_INJECTION_POINTS must be a JSON array")
+                        kwargs["cache_control_injection_points"] = cache_points
+                    except json.JSONDecodeError as e:
+                        raise ValueError(f"LITELLM.CACHE_CONTROL_INJECTION_POINTS contains invalid JSON: {str(e)}")

Evidence
The method is retried on openai.APIError and also converts generic exceptions into
openai.APIError; the new code path explicitly raises ValueError on invalid JSON, which will
therefore be wrapped and retried.
pr_agent/algo/ai_handlers/litellm_ai_handler.py[395-399]
pr_agent/algo/ai_handlers/litellm_ai_handler.py[535-545]
pr_agent/algo/ai_handlers/litellm_ai_handler.py[581-583]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Configuration validation failures (e.g., invalid JSON for `cache_control_injection_points`) currently raise `ValueError`, but the surrounding `except Exception` rethrows them as `openai.APIError`. Because `chat_completion` is wrapped in a tenacity `@retry` that retries on `openai.APIError`, these deterministic config errors get retried and are harder to diagnose.

## Issue Context
This impacts the newly-added `cache_control_injection_points` parsing path, but the fix should be localized to `chat_completion`'s exception handling.

## Fix Focus Areas
- pr_agent/algo/ai_handlers/litellm_ai_handler.py[395-399]
- pr_agent/algo/ai_handlers/litellm_ai_handler.py[535-546]
- pr_agent/algo/ai_handlers/litellm_ai_handler.py[567-583]

## Suggested change
- Add an `except ValueError as e:` block before the generic `except Exception` to log and `raise` (preserve ValueError), so tenacity won’t retry and the error remains clearly a configuration problem.
- Keep the existing wrapping for genuinely unknown exceptions.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

qodo-free-for-open-source-projects · 2026-05-19T16:35:07Z

+                if get_settings().get("LITELLM.CACHE_CONTROL_INJECTION_POINTS", None):
+                    try:
+                        cache_points = json.loads(get_settings().litellm.cache_control_injection_points)
+                        if not isinstance(cache_points, list):
+                            raise ValueError("LITELLM.CACHE_CONTROL_INJECTION_POINTS must be a JSON array")
+                        kwargs["cache_control_injection_points"] = cache_points


1. cache_control_injection_points overwrites kwargs 📘 Rule violation ⛨ Security

The new config pass-through sets kwargs["cache_control_injection_points"] without checking whether that key is already present. This can silently override caller-supplied/provider-supplied values and violates the requirement to guard against parameter collisions when merging request kwargs.

Agent Prompt

## Issue description `kwargs["cache_control_injection_points"]` is written unconditionally once the setting exists, which can silently overwrite an already-populated value coming from upstream kwargs processing. ## Issue Context PR Compliance requires explicit collision guards for merged request parameters so critical/behavior-changing kwargs cannot be overwritten without detection. ## Fix Focus Areas - pr_agent/algo/ai_handlers/litellm_ai_handler.py[538-545]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

qodo-free-for-open-source-projects · 2026-05-19T16:35:07Z

+                    try:
+                        cache_points = json.loads(get_settings().litellm.cache_control_injection_points)
+                        if not isinstance(cache_points, list):
+                            raise ValueError("LITELLM.CACHE_CONTROL_INJECTION_POINTS must be a JSON array")
+                        kwargs["cache_control_injection_points"] = cache_points
+                    except json.JSONDecodeError as e:
+                        raise ValueError(f"LITELLM.CACHE_CONTROL_INJECTION_POINTS contains invalid JSON: {str(e)}")


2. cache_points parsing misses typeerror 📘 Rule violation ☼ Reliability

The new JSON parsing for cache_control_injection_points only handles json.JSONDecodeError, but json.loads(...) can also raise TypeError when the setting is not a string/bytes (e.g., a TOML value parsed into a native Python list). This violates the compliance expectation to validate/normalize configuration at the boundary and surface targeted configuration errors instead of leaking unexpected runtime exceptions that can abort a request.

Agent Prompt

## Issue description `cache_control_injection_points` is currently passed through `json.loads(...)` with handling only for `json.JSONDecodeError`, but the value can legitimately arrive from configuration as a native Python type (e.g., a list from TOML parsing), in which case `json.loads` raises `TypeError` that is unhandled and can abort the request. Update the boundary handling to validate/normalize the type and raise a targeted configuration error (or accept the already-parsed type) rather than leaking unexpected exceptions. ## Issue Context This setting is configured via TOML (e.g., `configuration.toml` / `.pr_agent.toml`) and the settings loader uses `tomllib.load`, which yields native Python types such as lists and dicts for TOML arrays/tables. The current implementation always attempts JSON parsing regardless of the incoming type and only catches JSON decode failures, so mis-typed or naturally-typed TOML values can cause an unhandled `TypeError` instead of a clear, configuration-focused error message. ## Fix Focus Areas - pr_agent/algo/ai_handlers/litellm_ai_handler.py[535-546]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

github-actions Bot added the feature 💡 label May 19, 2026

lordzaharum mentioned this pull request May 19, 2026

feat(litellm): cache_control_injection_points pass-through lordzaharum/pr-agent#1

Merged

lordzaharum closed this May 19, 2026

lordzaharum deleted the feat/cache-control-injection-points-passthrough branch May 19, 2026 16:29

lordzaharum restored the feat/cache-control-injection-points-passthrough branch May 19, 2026 16:29

lordzaharum reopened this May 19, 2026

qodo-free-for-open-source-projects Bot added the Compliance violation label May 19, 2026

qodo-free-for-open-source-projects Bot reviewed May 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(litellm): pass-through cache_control_injection_points for Anthropic prompt caching#2405

feat(litellm): pass-through cache_control_injection_points for Anthropic prompt caching#2405
lordzaharum wants to merge 1 commit into
The-PR-Agent:mainfrom
lordzaharum:feat/cache-control-injection-points-passthrough

lordzaharum commented May 19, 2026

Uh oh!

qodo-free-for-open-source-projects Bot commented May 19, 2026 •

edited

Loading

Uh oh!

qodo-free-for-open-source-projects Bot commented May 19, 2026 •

edited

Loading

Uh oh!

qodo-free-for-open-source-projects Bot commented May 19, 2026 •

edited

Loading

Uh oh!

qodo-free-for-open-source-projects Bot May 19, 2026

Uh oh!

qodo-free-for-open-source-projects Bot May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lordzaharum commented May 19, 2026

Summary

Why

Changes

Usage

Compatibility

Cost impact (real-world)

Validation

Uh oh!

qodo-free-for-open-source-projects Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review Summary by Qodo

(Agentic_describe updated until commit f59c5c3)

Walkthroughs

File Changes

Uh oh!

qodo-free-for-open-source-projects Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review by Qodo

New Review Started

Uh oh!

qodo-free-for-open-source-projects Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review by Qodo

Uh oh!

qodo-free-for-open-source-projects Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

qodo-free-for-open-source-projects Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

qodo-free-for-open-source-projects Bot commented May 19, 2026 •

edited

Loading

(Agentic_describe updated until commit `f59c5c3`)

qodo-free-for-open-source-projects Bot commented May 19, 2026 •

edited

Loading

qodo-free-for-open-source-projects Bot commented May 19, 2026 •

edited

Loading