Skip to content

feat(litellm): pass-through cache_control_injection_points for Anthropic prompt caching#2405

Open
lordzaharum wants to merge 1 commit into
The-PR-Agent:mainfrom
lordzaharum:feat/cache-control-injection-points-passthrough
Open

feat(litellm): pass-through cache_control_injection_points for Anthropic prompt caching#2405
lordzaharum wants to merge 1 commit into
The-PR-Agent:mainfrom
lordzaharum:feat/cache-control-injection-points-passthrough

Conversation

@lordzaharum
Copy link
Copy Markdown

Summary

Adds a small config pass-through to expose LiteLLM SDK's cache_control_injection_points kwarg via .pr_agent.toml / configuration.toml, enabling Anthropic prompt caching for self-hosted PR-Agent setups.

[PR-TARGET-BYPASS] cross-repo upstream PR to qodo-ai/pr-agent (external project, main = integration branch).

Why

PR-Agent currently does not surface LiteLLM's prompt caching feature. For self-hosted setups with an Anthropic Claude backend, this means every review pays the full input-token cost even when the system prompt is static (typical 3-5K tokens from extra_instructions + persona).

LiteLLM SDK already supports cache_control_injection_points natively (see LiteLLM prompt caching docs), but PR-Agent does not expose it through its configuration layer.

Changes

  • pr_agent/algo/ai_handlers/litellm_ai_handler.py — read LITELLM.CACHE_CONTROL_INJECTION_POINTS from settings (mirroring the existing extra_headers handling immediately above), parse the JSON array, and pass it to the acompletion call via kwargs. ~12 lines.
  • pr_agent/settings/configuration.toml — add a commented-out default + usage example inside the existing [litellm] section. 3 lines.

Total diff: ~15 lines.

Usage

In .pr_agent.toml or configuration.toml:

[litellm]
cache_control_injection_points = '[{"location": "message", "role": "system"}]'

json is already imported in the handler file, so no new imports are needed.

Compatibility

  • Backwards compatible: empty / missing setting = no caching = current behavior.
  • LiteLLM >= 1.49.0 (already a transitive dependency of pr-agent).
  • Tested against Anthropic Claude Sonnet 4.6 backend in a self-hosted PR-Agent action.

Cost impact (real-world)

Production setup (self-hosted PR-Agent GitHub Action with Anthropic Claude Sonnet 4.6):

  • Before: ~24K input tokens per review, ~$0.10 / PR, paid in full on every iteration round.
  • After (expected): with caching enabled on the static system message (3-5K tokens), 30-50% reduction on iterative review rounds within the 5-minute Anthropic TTL window. Verified by cache_creation_input_tokens > 0 on first review and cache_read_input_tokens > 0 on subsequent rounds in the Anthropic Console.

Validation

  • AST parse of edited handler: OK.
  • No-config behavior: unchanged (setting absent => branch not taken).
  • Invalid JSON / non-list value: raises ValueError with a clear message, mirroring the existing LITELLM.EXTRA_HEADERS validation pattern.

…pic prompt caching

Add config pass-through to expose LiteLLM SDK's cache_control_injection_points
kwarg via .pr_agent.toml or configuration.toml.

Enables Anthropic prompt caching for self-hosted PR-Agent setups:

    [litellm]
    cache_control_injection_points = '[{"location": "message", "role": "system"}]'

LiteLLM SDK supports this kwarg natively per
https://docs.litellm.ai/docs/tutorials/prompt_caching
but PR-Agent did not surface it through configuration. With static system
prompts of 3-5K tokens (typical extra_instructions), caching delivers
30-50% input-token cost reduction on iterative review rounds within the
5-minute Anthropic TTL window.

Backwards compatible: empty/missing setting = current behavior (no caching).
@github-actions github-actions Bot added the feature 💡 label May 19, 2026
@qodo-free-for-open-source-projects
Copy link
Copy Markdown
Contributor

qodo-free-for-open-source-projects Bot commented May 19, 2026

Review Summary by Qodo

(Agentic_describe updated until commit f59c5c3)

Add Anthropic prompt caching support via LiteLLM configuration pass-through

✨ Enhancement

Grey Divider

Walkthroughs

Description
• Expose LiteLLM's cache_control_injection_points configuration for Anthropic prompt caching
• Enable 30-50% input-token cost reduction on iterative reviews via static system prompt caching
• Add JSON array configuration option in [litellm] section with validation
• Maintain backwards compatibility with existing behavior when setting is absent
Diagram
flowchart LR
  A["Configuration<br/>cache_control_injection_points"] -->|JSON parse & validate| B["LiteLLM Handler"]
  B -->|pass to kwargs| C["LiteLLM acompletion"]
  C -->|Anthropic API| D["Prompt Cache<br/>30-50% savings"]
Loading

Grey Divider

File Changes

1. pr_agent/algo/ai_handlers/litellm_ai_handler.py ✨ Enhancement +12/-0

Add cache_control_injection_points pass-through to LiteLLM handler

• Read LITELLM.CACHE_CONTROL_INJECTION_POINTS from settings configuration
• Parse JSON array and validate it matches expected format (must be list)
• Pass parsed cache_control_injection_points to LiteLLM acompletion call via kwargs
• Include error handling with clear messages for invalid JSON or non-list values

pr_agent/algo/ai_handlers/litellm_ai_handler.py


2. pr_agent/settings/configuration.toml 📝 Documentation +3/-0

Document cache_control_injection_points configuration option

• Add commented-out cache_control_injection_points configuration option in [litellm] section
• Include usage example showing JSON array format for Anthropic prompt caching
• Add reference link to LiteLLM prompt caching documentation

pr_agent/settings/configuration.toml


Grey Divider

Qodo Logo

@qodo-free-for-open-source-projects
Copy link
Copy Markdown
Contributor

qodo-free-for-open-source-projects Bot commented May 19, 2026

Code Review by Qodo

Grey Divider

New Review Started

This review has been superseded by a new analysis

Grey Divider

Qodo Logo

@lordzaharum lordzaharum deleted the feat/cache-control-injection-points-passthrough branch May 19, 2026 16:29
@lordzaharum lordzaharum restored the feat/cache-control-injection-points-passthrough branch May 19, 2026 16:29
@lordzaharum lordzaharum reopened this May 19, 2026
@qodo-free-for-open-source-projects
Copy link
Copy Markdown
Contributor

qodo-free-for-open-source-projects Bot commented May 19, 2026

Code Review by Qodo

🐞 Bugs (1) 📘 Rule violations (3)

Grey Divider


Action required

1. cache_control_injection_points overwrites kwargs 📘 Rule violation ⛨ Security
Description
The new config pass-through sets kwargs["cache_control_injection_points"] without checking whether
that key is already present. This can silently override caller-supplied/provider-supplied values and
violates the requirement to guard against parameter collisions when merging request kwargs.
Code

pr_agent/algo/ai_handlers/litellm_ai_handler.py[R538-543]

+                if get_settings().get("LITELLM.CACHE_CONTROL_INJECTION_POINTS", None):
+                    try:
+                        cache_points = json.loads(get_settings().litellm.cache_control_injection_points)
+                        if not isinstance(cache_points, list):
+                            raise ValueError("LITELLM.CACHE_CONTROL_INJECTION_POINTS must be a JSON array")
+                        kwargs["cache_control_injection_points"] = cache_points
Evidence
Rule 17 requires explicit collision guards when merging request parameters. The added code writes
kwargs["cache_control_injection_points"] directly, with no check for an existing value, so an
earlier source of the same key would be overwritten silently.

pr_agent/algo/ai_handlers/litellm_ai_handler.py[538-544]
Best Practice: Learned patterns

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`kwargs["cache_control_injection_points"]` is written unconditionally once the setting exists, which can silently overwrite an already-populated value coming from upstream kwargs processing.

## Issue Context
PR Compliance requires explicit collision guards for merged request parameters so critical/behavior-changing kwargs cannot be overwritten without detection.

## Fix Focus Areas
- pr_agent/algo/ai_handlers/litellm_ai_handler.py[538-545]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. cache_points parsing misses TypeError 📘 Rule violation ☼ Reliability
Description
The new JSON parsing for cache_control_injection_points only handles json.JSONDecodeError, but
json.loads(...) can also raise TypeError when the setting is not a string/bytes (e.g., a TOML
value parsed into a native Python list). This violates the compliance expectation to
validate/normalize configuration at the boundary and surface targeted configuration errors instead
of leaking unexpected runtime exceptions that can abort a request.
Code

pr_agent/algo/ai_handlers/litellm_ai_handler.py[R539-545]

+                    try:
+                        cache_points = json.loads(get_settings().litellm.cache_control_injection_points)
+                        if not isinstance(cache_points, list):
+                            raise ValueError("LITELLM.CACHE_CONTROL_INJECTION_POINTS must be a JSON array")
+                        kwargs["cache_control_injection_points"] = cache_points
+                    except json.JSONDecodeError as e:
+                        raise ValueError(f"LITELLM.CACHE_CONTROL_INJECTION_POINTS contains invalid JSON: {str(e)}")
Evidence
The cited code path in chat_completion retrieves
get_settings().litellm.cache_control_injection_points and unconditionally applies
json.loads(...) while only catching json.JSONDecodeError, meaning non-string inputs can trigger
an unhandled TypeError. Since the settings loader parses TOML using tomllib.load, TOML
arrays/tables are naturally produced as native Python types (including lists/dicts), so a valid TOML
representation of this setting can reach this boundary as a list and cause the request to crash
instead of being validated/normalized and converted into a clear configuration error as required by
Rule 18.

pr_agent/algo/ai_handlers/litellm_ai_handler.py[539-545]
pr_agent/custom_merge_loader.py[70-72]
pr_agent/algo/ai_handlers/litellm_ai_handler.py[535-545]
pr_agent/settings/configuration.toml[321-331]
Best Practice: Learned patterns

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`cache_control_injection_points` is currently passed through `json.loads(...)` with handling only for `json.JSONDecodeError`, but the value can legitimately arrive from configuration as a native Python type (e.g., a list from TOML parsing), in which case `json.loads` raises `TypeError` that is unhandled and can abort the request. Update the boundary handling to validate/normalize the type and raise a targeted configuration error (or accept the already-parsed type) rather than leaking unexpected exceptions.

## Issue Context
This setting is configured via TOML (e.g., `configuration.toml` / `.pr_agent.toml`) and the settings loader uses `tomllib.load`, which yields native Python types such as lists and dicts for TOML arrays/tables. The current implementation always attempts JSON parsing regardless of the incoming type and only catches JSON decode failures, so mis-typed or naturally-typed TOML values can cause an unhandled `TypeError` instead of a clear, configuration-focused error message.

## Fix Focus Areas
- pr_agent/algo/ai_handlers/litellm_ai_handler.py[535-546]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

3. configuration.toml adds commented config 📘 Rule violation ⚙ Maintainability
Description
The PR adds new commented-out configuration lines documenting cache_control_injection_points. This
violates the requirement to avoid commented-out/dead code in submitted changes.
Code

pr_agent/settings/configuration.toml[R329-331]

+# cache_control_injection_points = "" # Optional: JSON array enabling Anthropic prompt caching via LiteLLM
+# Example: cache_control_injection_points = '[{"location": "message", "role": "system"}]'
+# See https://docs.litellm.ai/docs/tutorials/prompt_caching
Evidence
Rule 2 forbids introducing commented-out code blocks. The PR adds three new commented lines
describing and exemplifying cache_control_injection_points in configuration.toml.

Rule 2: No Dead or Commented-Out Code
pr_agent/settings/configuration.toml[329-331]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
New commented-out configuration/example lines were added to `pr_agent/settings/configuration.toml`, which conflicts with the rule forbidding commented-out/dead code in PRs.

## Issue Context
If the intent is to document usage, prefer adding/expanding dedicated docs (or another approved documentation location) rather than adding commented-out config lines.

## Fix Focus Areas
- pr_agent/settings/configuration.toml[329-331]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


4. Config errors retried 🐞 Bug ☼ Reliability
Description
Invalid cache_control_injection_points JSON raises ValueError, but chat_completion's catch-all wraps
any Exception as openai.APIError; because chat_completion is decorated to retry openai.APIError,
this deterministic config error will be retried multiple times and surfaced as an API failure.
Code

pr_agent/algo/ai_handlers/litellm_ai_handler.py[R540-545]

+                        cache_points = json.loads(get_settings().litellm.cache_control_injection_points)
+                        if not isinstance(cache_points, list):
+                            raise ValueError("LITELLM.CACHE_CONTROL_INJECTION_POINTS must be a JSON array")
+                        kwargs["cache_control_injection_points"] = cache_points
+                    except json.JSONDecodeError as e:
+                        raise ValueError(f"LITELLM.CACHE_CONTROL_INJECTION_POINTS contains invalid JSON: {str(e)}")
Evidence
The method is retried on openai.APIError and also converts generic exceptions into
openai.APIError; the new code path explicitly raises ValueError on invalid JSON, which will
therefore be wrapped and retried.

pr_agent/algo/ai_handlers/litellm_ai_handler.py[395-399]
pr_agent/algo/ai_handlers/litellm_ai_handler.py[535-545]
pr_agent/algo/ai_handlers/litellm_ai_handler.py[581-583]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Configuration validation failures (e.g., invalid JSON for `cache_control_injection_points`) currently raise `ValueError`, but the surrounding `except Exception` rethrows them as `openai.APIError`. Because `chat_completion` is wrapped in a tenacity `@retry` that retries on `openai.APIError`, these deterministic config errors get retried and are harder to diagnose.

## Issue Context
This impacts the newly-added `cache_control_injection_points` parsing path, but the fix should be localized to `chat_completion`'s exception handling.

## Fix Focus Areas
- pr_agent/algo/ai_handlers/litellm_ai_handler.py[395-399]
- pr_agent/algo/ai_handlers/litellm_ai_handler.py[535-546]
- pr_agent/algo/ai_handlers/litellm_ai_handler.py[567-583]

## Suggested change
- Add an `except ValueError as e:` block before the generic `except Exception` to log and `raise` (preserve ValueError), so tenacity won’t retry and the error remains clearly a configuration problem.
- Keep the existing wrapping for genuinely unknown exceptions.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

Comment on lines +538 to +543
if get_settings().get("LITELLM.CACHE_CONTROL_INJECTION_POINTS", None):
try:
cache_points = json.loads(get_settings().litellm.cache_control_injection_points)
if not isinstance(cache_points, list):
raise ValueError("LITELLM.CACHE_CONTROL_INJECTION_POINTS must be a JSON array")
kwargs["cache_control_injection_points"] = cache_points
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. cache_control_injection_points overwrites kwargs 📘 Rule violation ⛨ Security

The new config pass-through sets kwargs["cache_control_injection_points"] without checking whether
that key is already present. This can silently override caller-supplied/provider-supplied values and
violates the requirement to guard against parameter collisions when merging request kwargs.
Agent Prompt
## Issue description
`kwargs["cache_control_injection_points"]` is written unconditionally once the setting exists, which can silently overwrite an already-populated value coming from upstream kwargs processing.

## Issue Context
PR Compliance requires explicit collision guards for merged request parameters so critical/behavior-changing kwargs cannot be overwritten without detection.

## Fix Focus Areas
- pr_agent/algo/ai_handlers/litellm_ai_handler.py[538-545]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Comment on lines +539 to +545
try:
cache_points = json.loads(get_settings().litellm.cache_control_injection_points)
if not isinstance(cache_points, list):
raise ValueError("LITELLM.CACHE_CONTROL_INJECTION_POINTS must be a JSON array")
kwargs["cache_control_injection_points"] = cache_points
except json.JSONDecodeError as e:
raise ValueError(f"LITELLM.CACHE_CONTROL_INJECTION_POINTS contains invalid JSON: {str(e)}")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

2. cache_points parsing misses typeerror 📘 Rule violation ☼ Reliability

The new JSON parsing for cache_control_injection_points only handles json.JSONDecodeError, but
json.loads(...) can also raise TypeError when the setting is not a string/bytes (e.g., a TOML
value parsed into a native Python list). This violates the compliance expectation to
validate/normalize configuration at the boundary and surface targeted configuration errors instead
of leaking unexpected runtime exceptions that can abort a request.
Agent Prompt
## Issue description
`cache_control_injection_points` is currently passed through `json.loads(...)` with handling only for `json.JSONDecodeError`, but the value can legitimately arrive from configuration as a native Python type (e.g., a list from TOML parsing), in which case `json.loads` raises `TypeError` that is unhandled and can abort the request. Update the boundary handling to validate/normalize the type and raise a targeted configuration error (or accept the already-parsed type) rather than leaking unexpected exceptions.

## Issue Context
This setting is configured via TOML (e.g., `configuration.toml` / `.pr_agent.toml`) and the settings loader uses `tomllib.load`, which yields native Python types such as lists and dicts for TOML arrays/tables. The current implementation always attempts JSON parsing regardless of the incoming type and only catches JSON decode failures, so mis-typed or naturally-typed TOML values can cause an unhandled `TypeError` instead of a clear, configuration-focused error message.

## Fix Focus Areas
- pr_agent/algo/ai_handlers/litellm_ai_handler.py[535-546]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant