HDDS-15359. Read chunks into pooled direct buffers by smengcl · Pull Request #10353 · apache/ozone

smengcl · 2026-05-24T05:55:46Z

Generated-by: Claude Code (Opus 4.7)

What changes were proposed in this pull request?

BufferUtils.assignByteBuffers previously allocated heap ByteBuffers for chunk reads. Reading from a FileChannel into a heap buffer forces the JDK to first read into a thread-local DirectByteBuffer (in sun.nio.ch.IOUtil) and then memcpy the chunk-sized payload into the user's heap buffer. So every default-config chunk read paid one extra user-space chunk-sized memcpy plus one chunk-sized heap allocation.

Two prior PRs added opt-in direct-buffer paths to skip exactly this copy:

HDDS-7188 (Tsz-Wo Nicholas Sze, Jan 2025): ChunkUtils.readDataNettyChunkedNioFile via Netty PooledByteBufAllocator.DEFAULT
HDDS-10488 (Sammi Chen, Sep 2024): MappedBufferManager mmap reads with semaphore-tracked quota

Both deliberately avoided raw ByteBuffer.allocateDirect because it is a known leak pattern under sustained load. Tiny wrapper objects mean GC fires rarely, the Cleaner that frees direct memory never runs promptly, and the native budget exhausts with OutOfMemoryError: Direct buffer memory. Neither PR updated the default assignByteBuffers path, so the heap allocation stayed.

This change uses Hadoop's org.apache.hadoop.util.DirectBufferPool, the same pool already used by BlockOutputStream and KeyValueContainerCheck. Release plumbing lives entirely in ChunkUtils.readData: a release lambda is registered on DispatcherContext.setReleaseMethod, fires once in GrpcXceiverService's finally block after responseObserver.onNext returns (the standard server marshaller is synchronous, so the response bytes are already serialized into the Netty outgoing buffer by then), and returns each buffer to the pool. Paths without a release-supporting context fall back to GC reclaim via the pool's WeakReference layer, so the pool degrades to plain allocateDirect rather than leaking. ChunkBuffer and ChunkBufferImplWithByteBufferList are unchanged from master.

A consumer audit confirmed no .array() usage breaks with direct buffers. ChecksumByteBufferImpl guards .array() with hasArray(), and UnsafeByteOperations.unsafeWrap of a direct ByteBuffer produces a zero-copy NioByteString without calling .array(). One test (TestChunkUtils.concurrentReadWriteOfSameFile) called .array() directly and was updated to use get(byte[]).

Expected impact

Per chunk read:

One chunk-sized user-space memcpy eliminated (~270 µs for a 4 MB chunk on a typical x86 server with ~15 GB/s memory bandwidth).
One chunk-sized heap allocation eliminated (heap churn moves to bounded direct memory in the pool).

Wall-clock effect depends on storage:

Storage	Latency per chunk	Effect
Page-cache hit	sub-ms total	meaningful (memcpy was a large fraction)
NVMe sequential	~1 ms total	~20 percent reduction
SATA SSD	~10 ms total	a few percent
HDD	tens of ms total	invisible at wall-clock

Heap-churn reduction is consistent across storage tiers: at 100 MB/s read throughput a DN sheds roughly 100 MB/s of heap allocation.

Pool sizing and operational note

DirectBufferPool is unbounded by design (per ConcurrentMap<Integer, Queue<WeakReference<ByteBuffer>>>). Steady-state pool memory converges to peak concurrent in-flight buffers per capacity class. For default-config chunk reads (4 MB) on a 64-thread DN, that is roughly 256 MB of direct memory. Operators should ensure -XX:MaxDirectMemorySize accommodates this; the existing BlockOutputStream and KeyValueContainerCheck consumers of DirectBufferPool are already sized this way.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-15359

How was this patch tested?

Existing tests

Generated-by: Claude Code (Opus 4.7)

ChenSammi · 2026-05-27T07:55:59Z

@smengcl , thanks for working on this. The idea overall looks good to me. Stream read uses KeyValueHandler#readBlockImpl, it's not covered in the patch.

Further more, we should consider add a limit to the direct buffer pool in case it causes OOM failure. There is an existing example, BoundedElasticByteBufferPool.

HDDS-15359. Read chunks into pooled direct buffers

bdeb296

Generated-by: Claude Code (Opus 4.7)

smengcl added performance AI-gen datanode labels May 24, 2026

smengcl requested a review from ChenSammi May 24, 2026 05:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HDDS-15359. Read chunks into pooled direct buffers#10353

HDDS-15359. Read chunks into pooled direct buffers#10353
smengcl wants to merge 1 commit into
apache:masterfrom
smengcl:bufferutils-direct

smengcl commented May 24, 2026

Uh oh!

ChenSammi commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

smengcl commented May 24, 2026

What changes were proposed in this pull request?

Expected impact

Pool sizing and operational note

What is the link to the Apache JIRA

How was this patch tested?

Uh oh!

ChenSammi commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants