Skip to content

perf: TCP_NODELAY + cheap allocs/lookups on the engine_* hot path#27

Open
KA-ROM wants to merge 23 commits into
mainfrom
perf/cheap-allocs
Open

perf: TCP_NODELAY + cheap allocs/lookups on the engine_* hot path#27
KA-ROM wants to merge 23 commits into
mainfrom
perf/cheap-allocs

Conversation

@KA-ROM

@KA-ROM KA-ROM commented Jun 5, 2026

Copy link
Copy Markdown

Summary

Hot-path tuning for the engine_* proxy path: removes per-request allocations and
redundant lookups, sets TCP_NODELAY on both client- and backend-facing connections,
and folds the cached metric handles into a single ProxyHttpMetrics struct. Several
post-process and TCP-tuning experiments were explored and reverted in-branch

Removed from the hot path

  • No heap alloc per header check: is_hop_by_hop_header matches HeaderName constants directly; the per-call lowercased String is gone. (Set is unchanged from before: connection/host/keep-alive/transfer-encoding.)
  • ~5 fewer hashmap lookups per request: in-flight Gauge/Counter + proxy_failure_count handles resolved once per worker, bundled into ProxyHttpMetrics behind Arc so cloning a handle to span an await is one ref-count bump (scc::HashMap::clone is a deep copy).
  • No metric lookup on UA hit path: per-worker scc::HashMap<String, Counter> keyed by &str. The cache is soft-capped at 100 entries to bound the lookup map.
    • Note: the cap bounds the cache only, not the Prometheus series (get_or_create runs unconditionally on a miss). Safe because the authrpc client side is a closed network (sequencer/peer-mirror) with tiny UA cardinality. A safety comment on the field flags that a public-facing endpoint must gate get_or_create instead.
  • connection_info() bound once/request instead of 5x: skips redundant RefCell re-borrow + extensions lookup per call.
  • No per-header revalidation on append: headers_mut().append(...) skips reparsing names through awc's builder. Also fixes a latent multi-value drop (insert_header replaced same-name values).
  • No re-parse on response forwarding: dropped HeaderName::from_str (backend names already valid; the removed branch could only ever silently drop a header).
  • No full HeaderMap clone: capture only content-encoding (the one field read downstream for decompress-on-logging). Client-facing headers are still forwarded verbatim.

Behavior changes (observable on wire / in ops)

  • TCP_NODELAY on backend conns: disables Nagle on the backend leg; logs at debug on failure.
  • TCP_NODELAY on client conns: same fix on the sequencer-facing side, in ConnectionGuard::on_connect alongside the existing keepalive setup.

Refactor (no behavior change)

  • ProxyHttpMetrics struct: extracts cached handles + UA cache out of ProxyHttp into an Arc<ProxyHttpMetrics> with bump_client_info, inc/dec_in_flight_*, record_proxy_failure.

Explored and reverted (kept in history)

Each showed no measurable benefit on our envs / flag sets, or introduced a structural concern. Kept as separate revert commits so any can be re-explored.

  • Reuse parsed JrpcRequestMetaMaybeBatch in finalise_proxying: only fires on the dedup-on path (off in all our envs), so dead code in practice.
  • Skip postprocess entirely for non-logged, non-mirrored requests: gated on logging-fully-off + no-mirroring; never fires in current envs.
  • Defer postprocess_client_request off the critical path: introduced a dependency on actix-web's body-poll ordering; structural fragility not worth the unmeasured saving.
  • Tighten awc disconnect_timeout to 100ms: awc calls no_disconnect_timeout() for plaintext TCP pools, so the knob was inert for our http:// backends. A configurable flag is deferred to a follow-up PR.
  • TCP_KEEPALIVE on backend conns (60s): awc conn_keep_alive(2 * backend_timeout) is also 60s, so the pool drops the idle conn at the same idleness threshold; no-op on loopback.

Unchanged

  • Which requests get intercepted, mirrored, or logged.
  • No new CLI flag, env var, or config surface.

@KA-ROM KA-ROM marked this pull request as draft June 5, 2026 13:10
@KA-ROM KA-ROM changed the title Perf/cheap allocs Small optimizations Jun 5, 2026
@KA-ROM KA-ROM force-pushed the perf/cheap-allocs branch from d7d7f94 to f86fd2d Compare June 9, 2026 12:40
@KA-ROM KA-ROM changed the title Small optimizations perf: set TCP_NODELAY, cheap allocs, socket tuning, postprocess deferral on engine_* path Jun 9, 2026
@KA-ROM KA-ROM requested a review from 0x416e746f6e June 9, 2026 12:54
@KA-ROM KA-ROM marked this pull request as ready for review June 9, 2026 12:54
@KA-ROM KA-ROM requested review from akundaz, avalonche and julio4 June 9, 2026 12:56
Comment thread crates/rproxy/src/server/proxy/http/proxy.rs Outdated
Comment thread crates/rproxy/src/server/proxy/http/proxy.rs Outdated
Comment thread crates/rproxy/src/server/proxy/http/proxy.rs Outdated
Comment thread crates/rproxy/src/utils/utils_http.rs
Comment thread crates/rproxy/src/server/proxy/http/proxy.rs Outdated
Comment on lines +1409 to +1419
let tcp_nodelay = actix_tls::connect::Connector::new(
actix_tls::connect::Resolver::default(),
)
.service()
.map(|conn: actix_tls::connect::Connection<awc::http::Uri, tokio::net::TcpStream>| {
let _ = conn.io_ref().set_nodelay(true);
let _ = socket2::SockRef::from(conn.io_ref()).set_tcp_keepalive(
&socket2::TcpKeepalive::new().with_time(Duration::from_secs(60)),
);
conn
});

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question:

w.d.y.t. about putting this behind CLI-switch? (so that the user could choose on case-by-case).

(TCP_NODELAY could still be the default).

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo the CLI surface bloat doesn't seem worth it. rproxy's workload class is request-response JSON-RPC (small writes, latency-sensitive). Nagle on doesn't help anything in that shape. Nagle's actual use case is bundling many small writes into bigger packets to reduce per-packet overhead on slow/packet-counted links, which doesn't match the kind of traffic rproxy is built for. but I am open to more pushback, maybe you have use-cases in mind I didn't consider

Comment thread crates/rproxy/src/server/proxy/http/proxy.rs Outdated
Comment thread crates/rproxy/src/server/proxy/http/proxy.rs
Comment thread crates/rproxy/src/server/proxy/http/proxy.rs Outdated
Comment thread crates/rproxy/src/server/proxy/http/proxy.rs Outdated
Comment thread crates/rproxy/src/server/proxy/http/proxy.rs Outdated
let mut clnt_res = Self::to_client_response(&bknd_res);

let preallocate = this.shared.config().prealloacated_response_buffer_size();
let content_encoding = bknd_res.headers().get(header::CONTENT_ENCODING).cloned();

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question:

are we sure content-encoding is the only header we want to send back to the client?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we still forward every header to the client (the loop in to_client_response). this change only affects what we keep around internally for our own processing. before, it copied the whole header bag, now we only copy the one header (content-encoding), to decompress the body for logging. nothing the client sees should be different

@KA-ROM KA-ROM changed the title perf: set TCP_NODELAY, cheap allocs, socket tuning, postprocess deferral on engine_* path perf: TCP_NODELAY + cheap allocs/lookups on the engine_* hot path Jun 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants