feat(litellm): pass-through cache_control_injection_points for Anthropic prompt caching#2405
Conversation
…pic prompt caching
Add config pass-through to expose LiteLLM SDK's cache_control_injection_points
kwarg via .pr_agent.toml or configuration.toml.
Enables Anthropic prompt caching for self-hosted PR-Agent setups:
[litellm]
cache_control_injection_points = '[{"location": "message", "role": "system"}]'
LiteLLM SDK supports this kwarg natively per
https://docs.litellm.ai/docs/tutorials/prompt_caching
but PR-Agent did not surface it through configuration. With static system
prompts of 3-5K tokens (typical extra_instructions), caching delivers
30-50% input-token cost reduction on iterative review rounds within the
5-minute Anthropic TTL window.
Backwards compatible: empty/missing setting = current behavior (no caching).
Review Summary by Qodo(Agentic_describe updated until commit f59c5c3)Add Anthropic prompt caching support via LiteLLM configuration pass-through
WalkthroughsDescription• Expose LiteLLM's cache_control_injection_points configuration for Anthropic prompt caching • Enable 30-50% input-token cost reduction on iterative reviews via static system prompt caching • Add JSON array configuration option in [litellm] section with validation • Maintain backwards compatibility with existing behavior when setting is absent Diagramflowchart LR
A["Configuration<br/>cache_control_injection_points"] -->|JSON parse & validate| B["LiteLLM Handler"]
B -->|pass to kwargs| C["LiteLLM acompletion"]
C -->|Anthropic API| D["Prompt Cache<br/>30-50% savings"]
File Changes1. pr_agent/algo/ai_handlers/litellm_ai_handler.py
|
Code Review by Qodo
1. cache_control_injection_points overwrites kwargs
|
| if get_settings().get("LITELLM.CACHE_CONTROL_INJECTION_POINTS", None): | ||
| try: | ||
| cache_points = json.loads(get_settings().litellm.cache_control_injection_points) | ||
| if not isinstance(cache_points, list): | ||
| raise ValueError("LITELLM.CACHE_CONTROL_INJECTION_POINTS must be a JSON array") | ||
| kwargs["cache_control_injection_points"] = cache_points |
There was a problem hiding this comment.
1. cache_control_injection_points overwrites kwargs 📘 Rule violation ⛨ Security
The new config pass-through sets kwargs["cache_control_injection_points"] without checking whether that key is already present. This can silently override caller-supplied/provider-supplied values and violates the requirement to guard against parameter collisions when merging request kwargs.
Agent Prompt
## Issue description
`kwargs["cache_control_injection_points"]` is written unconditionally once the setting exists, which can silently overwrite an already-populated value coming from upstream kwargs processing.
## Issue Context
PR Compliance requires explicit collision guards for merged request parameters so critical/behavior-changing kwargs cannot be overwritten without detection.
## Fix Focus Areas
- pr_agent/algo/ai_handlers/litellm_ai_handler.py[538-545]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| try: | ||
| cache_points = json.loads(get_settings().litellm.cache_control_injection_points) | ||
| if not isinstance(cache_points, list): | ||
| raise ValueError("LITELLM.CACHE_CONTROL_INJECTION_POINTS must be a JSON array") | ||
| kwargs["cache_control_injection_points"] = cache_points | ||
| except json.JSONDecodeError as e: | ||
| raise ValueError(f"LITELLM.CACHE_CONTROL_INJECTION_POINTS contains invalid JSON: {str(e)}") |
There was a problem hiding this comment.
2. cache_points parsing misses typeerror 📘 Rule violation ☼ Reliability
The new JSON parsing for cache_control_injection_points only handles json.JSONDecodeError, but json.loads(...) can also raise TypeError when the setting is not a string/bytes (e.g., a TOML value parsed into a native Python list). This violates the compliance expectation to validate/normalize configuration at the boundary and surface targeted configuration errors instead of leaking unexpected runtime exceptions that can abort a request.
Agent Prompt
## Issue description
`cache_control_injection_points` is currently passed through `json.loads(...)` with handling only for `json.JSONDecodeError`, but the value can legitimately arrive from configuration as a native Python type (e.g., a list from TOML parsing), in which case `json.loads` raises `TypeError` that is unhandled and can abort the request. Update the boundary handling to validate/normalize the type and raise a targeted configuration error (or accept the already-parsed type) rather than leaking unexpected exceptions.
## Issue Context
This setting is configured via TOML (e.g., `configuration.toml` / `.pr_agent.toml`) and the settings loader uses `tomllib.load`, which yields native Python types such as lists and dicts for TOML arrays/tables. The current implementation always attempts JSON parsing regardless of the incoming type and only catches JSON decode failures, so mis-typed or naturally-typed TOML values can cause an unhandled `TypeError` instead of a clear, configuration-focused error message.
## Fix Focus Areas
- pr_agent/algo/ai_handlers/litellm_ai_handler.py[535-546]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
Summary
Adds a small config pass-through to expose LiteLLM SDK's
cache_control_injection_pointskwarg via.pr_agent.toml/configuration.toml, enabling Anthropic prompt caching for self-hosted PR-Agent setups.[PR-TARGET-BYPASS] cross-repo upstream PR to qodo-ai/pr-agent (external project, main = integration branch).
Why
PR-Agent currently does not surface LiteLLM's prompt caching feature. For self-hosted setups with an Anthropic Claude backend, this means every review pays the full input-token cost even when the system prompt is static (typical 3-5K tokens from
extra_instructions+ persona).LiteLLM SDK already supports
cache_control_injection_pointsnatively (see LiteLLM prompt caching docs), but PR-Agent does not expose it through its configuration layer.Changes
pr_agent/algo/ai_handlers/litellm_ai_handler.py— readLITELLM.CACHE_CONTROL_INJECTION_POINTSfrom settings (mirroring the existingextra_headershandling immediately above), parse the JSON array, and pass it to theacompletioncall viakwargs. ~12 lines.pr_agent/settings/configuration.toml— add a commented-out default + usage example inside the existing[litellm]section. 3 lines.Total diff: ~15 lines.
Usage
In
.pr_agent.tomlorconfiguration.toml:jsonis already imported in the handler file, so no new imports are needed.Compatibility
Cost impact (real-world)
Production setup (self-hosted PR-Agent GitHub Action with Anthropic Claude Sonnet 4.6):
cache_creation_input_tokens > 0on first review andcache_read_input_tokens > 0on subsequent rounds in the Anthropic Console.Validation
ValueErrorwith a clear message, mirroring the existingLITELLM.EXTRA_HEADERSvalidation pattern.