Skip to content

[Deepin-Kernel-SIG] [linux 6.6.y] [Upstream] Add support for multishot reads Add io_uring support for waitid#1815

Open
opsiff wants to merge 19 commits into
deepin-community:linux-6.6.yfrom
opsiff:linux-6.6.y-2026-06-04-exit
Open

[Deepin-Kernel-SIG] [linux 6.6.y] [Upstream] Add support for multishot reads Add io_uring support for waitid#1815
opsiff wants to merge 19 commits into
deepin-community:linux-6.6.yfrom
opsiff:linux-6.6.y-2026-06-04-exit

Conversation

@opsiff

@opsiff opsiff commented Jun 4, 2026

Copy link
Copy Markdown
Member

https://lore.kernel.org/all/20230911204021.1479172-1-axboe@kernel.dk/

Hi,

We support multishot for other request types, generally in the shape of
a flag for the request. Doing a flag based approach with reads isn't
straightforward, as the read/write flags are in the RWF_ space. Instead,
add a separate opcode for this, IORING_OP_READ_MULTISHOT.

This can only be used provided buffers, like other multishot request
types that read/receive data.

It can also only be used for pollable file types, like a tun device or
pipes, for example. File types that are always readable (or seekable),
like regular files, cannot be used with multishot reads.

--
Jens Axboe

https://lore.kernel.org/all/20230909151124.1229695-1-axboe@kernel.dk/
Hi,

This adds support for IORING_OP_WAITID, which is an async variant of
the waitid(2) syscall. Rather than have a parent need to block waiting
on a child task state change, it can now simply get an async notication
when the requested state change has occured.

Patches 1..4 are purely prep patches, and should not have functional
changes. They split out parts of do_wait() into __do_wait(), so that
the prepare-to-wait and sleep parts are contained within do_wait().

Patch 5 adds io_uring support.

I wrote a few basic tests for this, which can be found in the
'waitid' branch of liburing:

https://git.kernel.dk/cgit/liburing/log/?h=waitid

Also spun a custom kernel for someone to test it, and no issues reported
so far.

The code can also be found here:

https://git.kernel.dk/cgit/linux/log/?h=io_uring-waitid

include/linux/io_uring_types.h | 2 +
include/uapi/linux/io_uring.h | 2 +
io_uring/Makefile | 3 +-
io_uring/cancel.c | 5 +
io_uring/io_uring.c | 3 +
io_uring/opdef.c | 10 +-
io_uring/waitid.c | 372 +++++++++++++++++++++++++++++++++
io_uring/waitid.h | 15 ++
kernel/exit.c | 131 ++++++------
kernel/exit.h | 30 +++
10 files changed, 512 insertions(+), 61 deletions(-)

Changes since v3:

  • Rebase on current tree
  • Move it before the futex changes. Note that if you're testing this,
    this means that the opcode values have changed. The liburing repo
    has been rebased as a result as well, you'll want to update that too.
    liburing also has update test cases.
  • Fix races between cancelation and wakeup trigger. This follows a
    scheme similar to the internal poll io_uring handling

--
Jens Axboe

Summary by Sourcery

Add io_uring support for multishot reads and async waitid, along with the necessary wait/waitid refactoring and integration.

New Features:

  • Introduce IORING_OP_READ_MULTISHOT to support multishot buffered reads on pollable file types.
  • Add IORING_OP_WAITID to provide asynchronous waitid(2) semantics via io_uring, including compat siginfo support.

Enhancements:

  • Refactor waitid-related data structures and logic into shared helpers (__do_wait, kernel_waitid_prepare, pid_child_should_wake) for reuse by io_uring.
  • Extend io_uring opcode definitions with a vectored flag and adjust read/write iovec handling to rely on per-op vectored metadata.
  • Integrate waitid requests into io_uring cancellation and teardown paths using a per-context waitid list.

Build:

  • Add the new io_uring waitid implementation object to the io_uring build in the Makefile.

@sourcery-ai

sourcery-ai Bot commented Jun 4, 2026

Copy link
Copy Markdown

Reviewer's Guide

Adds io_uring multishot read support via a new IORING_OP_READ_MULTISHOT opcode and introduces asynchronous waitid support via a new IORING_OP_WAITID opcode, including core wait/waitid refactoring in kernel/exit.c to share logic with io_uring and to support safe cancellation and wakeup races.

Sequence diagram for io_uring WAITID async lifecycle and cancellation

sequenceDiagram
    actor App
    participant IoUring as io_uring
    participant WaitidPrep as io_waitid_prep
    participant WaitidIssue as io_waitid
    participant KWPrep as kernel_waitid_prepare
    participant DoWait as __do_wait
    participant WQ as wait_chldexit_queue
    participant WQCb as io_waitid_wait
    participant TW as io_waitid_cb
    participant Cancel as io_waitid_cancel

    App->>IoUring: submit SQE (opcode IORING_OP_WAITID)
    IoUring->>WaitidPrep: io_waitid_prep(sqe)
    WaitidPrep-->>IoUring: set io_waitid fields

    IoUring->>WaitidIssue: io_waitid(req, issue_flags)
    WaitidIssue->>KWPrep: kernel_waitid_prepare(&iwa.wo, which, upid, &info, options, NULL)
    KWPrep-->>WaitidIssue: 0 or error
    alt prepare error
        WaitidIssue-->>IoUring: io_req_set_res(error)
    else prepared
        WaitidIssue->>WQ: add_wait_queue(wait_chldexit_queue, &iwa.wo.child_wait)
        WaitidIssue->>DoWait: __do_wait(&iwa.wo)
        alt child already has matching state
            DoWait-->>WaitidIssue: ret >= 0 or < 0
            WaitidIssue->>IoUring: io_waitid_finish(ret)
        else need to sleep
            DoWait-->>WaitidIssue: -ERESTARTSYS
            Note over WaitidIssue,WQ: armed for async completion
        end
    end

    par child_state_change
        WQ->>WQCb: io_waitid_wait(child_wait, mode, sync, task)
        alt pid_child_should_wake
            WQCb->>TW: io_req_task_work_add(req)
        end
    and async_task_work
        TW->>DoWait: __do_wait(&iwa.wo)
        DoWait-->>TW: ret
        TW->>WaitidIssue: io_waitid_complete(req, ret)
        WaitidIssue->>IoUring: io_req_task_complete(req)
    and cancellation
        App->>Cancel: io_waitid_cancel(ctx, &cd, issue_flags)
        Cancel->>WaitidIssue: __io_waitid_cancel(ctx, req)
        alt first owner
            WaitidIssue->>WQ: list_del_init(&iwa.wo.child_wait.entry)
            WaitidIssue->>IoUring: io_waitid_complete(req, -ECANCELED)
        end
    end
Loading

File-Level Changes

Change Details Files
Refactor wait/waitid core logic to be reusable and to separate sleep/waitqueue handling from child-scanning logic.
  • Move waitid-related structs (waitid_info, wait_opts) and helpers into new kernel/exit.h header and include it from kernel/exit.c and io_uring waitid code.
  • Introduce pid_child_should_wake() to centralize child eligibility checks used by both traditional wait code and io_uring waitid.
  • Split do_wait() into __do_wait() (pure child scanning / semantics) and a thin do_wait() wrapper that manages waitqueue registration, sleep/retry loop, and signal handling.
  • Add kernel_waitid_prepare() to factor parameter validation and wait_opts setup out of kernel_waitid(), so io_uring can reuse it without duplicating logic.
  • Adjust kernel_waitid() to use kernel_waitid_prepare(), to use wo.wo_flags for WNOHANG/EAGAIN translation, and to manage wo.wo_pid lifetime via put_pid().
kernel/exit.c
kernel/exit.h
Add multishot read support to io_uring via a dedicated READ_MULTISHOT opcode and integrate it into the rw/opdef paths.
  • Add io_read_mshot_prep() and io_read_mshot() helpers that prepare and execute multishot reads, setting REQ_F_APOLL_MULTISHOT and enforcing that the target file is pollable.
  • Implement multishot completion semantics: recycle buffers on -EAGAIN, emit CQEs with IORING_CQE_F_MORE on positive results, terminate on errors or CQ overflow, and honor IO_URING_F_MULTISHOT handling.
  • Extend io_issue_def with a vectored flag and mark READ/WRITE as vectored to adjust buffer import and poll reissue logic.
  • Adjust __io_import_iovec() and __io_read() to use the new vectored flag and to avoid direct poll-based retry for vectored ops.
  • Expose IORING_OP_READ_MULTISHOT in the public uapi enum and cold defs, including a human-readable name.
  • Declare the new prep/issue functions in io_uring/rw.h.
io_uring/rw.c
io_uring/rw.h
io_uring/opdef.c
io_uring/opdef.h
include/uapi/linux/io_uring.h
Introduce asynchronous waitid support in io_uring, including a new WAITID opcode, request tracking, and safe cancellation and teardown.
  • Add new io_uring waitid implementation (waitid.c/waitid.h) that wraps kernel_waitid_prepare() and __do_wait(), and provides io_waitid_prep(), io_waitid(), io_waitid_cancel(), and io_waitid_remove_all().
  • Define io_waitid request state (io_waitid/io_waitid_async) carrying wait_opts, ref-counted ownership, waitqueue head pointer, and user-space siginfo pointers.
  • Implement waitqueue callback io_waitid_wait() and task_work callback io_waitid_cb() to handle wakeups, re-arm on -ERESTARTSYS, and resolve races between cancelation and wakeup using a reference counter with a cancel flag bit.
  • Implement helper paths to copy siginfo to user-space (including compat mode) and to free async state, as well as io_waitid_finish()/io_waitid_complete() that complete the request with proper CQE result and failure flagging.
  • Add a dedicated waitid_list hlist to io_ring_ctx, initialize it at ctx creation, and link WAITID requests on submission for lookup by cancel and teardown paths.
  • Wire waitid cancel and teardown into io_uring by calling io_waitid_cancel() from io_try_cancel() and io_waitid_remove_all() from io_uring_try_cancel_requests().
  • Expose IORING_OP_WAITID and its async context size in opdef.c cold defs and add a waitid_flags field to struct io_uring_sqe for future extension (validated as zero for now).
io_uring/waitid.c
io_uring/waitid.h
io_uring/opdef.c
io_uring/opdef.h
io_uring/io_uring.c
io_uring/cancel.c
include/linux/io_uring_types.h
include/uapi/linux/io_uring.h
io_uring/Makefile

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@deepin-ci-robot deepin-ci-robot requested a review from myml June 4, 2026 13:23
@deepin-ci-robot

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from opsiff. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • In kernel_waitid_prepare() you initialize f_flags to 0 and never assign to it, so the O_NONBLOCK-based WNOHANG behavior is effectively dead; either remove f_flags entirely or plumb the intended file flags in so that pidfd/non-blocking semantics are preserved.
  • The new io_uring/waitid.h header lacks an include guard, which can lead to multiple-definition issues if it is included more than once; add a conventional #ifndef/#define/#endif guard around its contents.
  • In struct io_waitid the struct file *file member is never used and can be removed to avoid confusion and keep the structure minimal.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `kernel_waitid_prepare()` you initialize `f_flags` to 0 and never assign to it, so the `O_NONBLOCK`-based WNOHANG behavior is effectively dead; either remove `f_flags` entirely or plumb the intended file flags in so that pidfd/non-blocking semantics are preserved.
- The new `io_uring/waitid.h` header lacks an include guard, which can lead to multiple-definition issues if it is included more than once; add a conventional `#ifndef/#define/#endif` guard around its contents.
- In `struct io_waitid` the `struct file *file` member is never used and can be removed to avoid confusion and keep the structure minimal.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request backports upstream io_uring enhancements by adding a new multishot read opcode and adding async waitid(2) support, including necessary refactoring of the core wait logic in kernel/exit.c so it can be reused from io_uring.

Changes:

  • Add IORING_OP_READ_MULTISHOT to support multishot reads using provided buffers on pollable file types.
  • Add IORING_OP_WAITID and a new io_uring/waitid.c implementation for asynchronous waitid(2) notifications, with integration into io_uring cancellation and teardown.
  • Refactor wait/waitid internals by introducing shared helpers (__do_wait(), kernel_waitid_prepare(), pid_child_should_wake()) and adjusting rw vectored handling to use per-op metadata.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
kernel/exit.h New internal header exposing wait helpers/structs for reuse by io_uring.
kernel/exit.c Refactors do_wait() logic into __do_wait() and adds shared helper functions.
io_uring/waitid.h Declares io_uring async waitid interfaces and async state container.
io_uring/waitid.c Implements the IORING_OP_WAITID opcode and cancellation/removal support.
io_uring/rw.h Exposes new multishot read prep/issue helpers.
io_uring/rw.c Adds multishot read implementation and updates vectored detection logic.
io_uring/opdef.h Adds a per-op vectored metadata bit to io_issue_def.
io_uring/opdef.c Registers READ_MULTISHOT and WAITID opcodes and marks vectored ops.
io_uring/Makefile Adds waitid.o to the io_uring build.
io_uring/io_uring.c Initializes ctx->waitid_list and cancels waitid requests on teardown.
io_uring/cancel.c Hooks async cancel into WAITID cancellation handling.
include/uapi/linux/io_uring.h Extends SQE union and adds opcode IDs for READ_MULTISHOT and WAITID.
include/linux/io_uring_types.h Adds waitid_list to io_ring_ctx for tracking outstanding waitid requests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread kernel/exit.c
Comment on lines 1599 to 1603
read_lock(&tasklist_lock);

if (wo->wo_type == PIDTYPE_PID) {
retval = do_wait_pid(wo);
if (retval)
Comment thread kernel/exit.c
Comment on lines +1631 to +1634
static long do_wait(struct wait_opts *wo)
{
int retval;

Comment thread kernel/exit.h
Comment on lines +1 to +4
// SPDX-License-Identifier: GPL-2.0-only
#ifndef LINUX_WAITID_H
#define LINUX_WAITID_H

axboe and others added 9 commits June 16, 2026 22:33
mainline inclusion
from mainline-v6.7-rc1
category: performance

This is cleaner than gating on the opcode type, particularly as more
read/write type opcodes may be added.

Then we can use that for the data import, and for __io_read() on
whether or not we need to copy state.

Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Conflicts:
	io_uring/rw.c
[because of ("io_uring: add io_file_can_poll() helper") merged before]
(cherry picked from commit d2d778f)
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
mainline inclusion
from mainline-v6.7-rc1
category: performance

This behaves like IORING_OP_READ, except:

1) It only supports pollable files (eg pipes, sockets, etc). Note that
   for sockets, you probably want to use recv/recvmsg with multishot
   instead.

2) It supports multishot mode, meaning it will repeatedly trigger a
   read and fill a buffer when data is available. This allows similar
   use to recv/recvmsg but on non-sockets, where a single request will
   repeatedly post a CQE whenever data is read from it.

3) Because of #2, it must be used with provided buffers. This is
   uniformly true across any request type that supports multishot and
   transfers data, with the reason being that it's obviously not
   possible to pass in a single buffer for the data, as multiple reads
   may very well trigger before an application has a chance to process
   previous CQEs and the data passed from them.

Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Conflicts:
	io_uring/rw.c
(cherry picked from commit fc68fcd)
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
mainline inclusion
from mainline-v6.7-rc1
category: other

Abstract out the helper that decides if we should wake up following
a wake_up() callback on our internal waitqueue.

No functional changes intended in this patch.

Acked-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit 9d900d4)
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
mainline inclusion
from mainline-v6.7-rc1
category: other

Rather than have a maze of gotos, put the actual logic in __do_wait()
and have do_wait() loop deal with waitqueue setup/teardown and whether
to call __do_wait() again.

No functional changes intended in this patch.

Acked-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit 06a101c)
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
mainline inclusion
from mainline-v6.7-rc1
category: other

Move the setup logic out of kernel_waitid(), and into a separate helper.

No functional changes intended in this patch.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit eda7e9d)
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
mainline inclusion
from mainline-v6.7-rc1
category: performance

Move struct wait_opts and waitid_info into kernel/exit.h, and include
function declarations for the recently added helpers. Make them
non-static as well.

This is in preparation for adding a waitid operation through io_uring.
With the abtracted helpers, this is now possible.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit 2e521a2)
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
mainline inclusion
from mainline-v6.7-rc1
category: performance

This adds support for an async version of waitid(2), in a fully async
version. If an event isn't immediately available, wait for a callback
to trigger a retry.

The format of the sqe is as follows:

sqe->len		The 'which', the idtype being queried/waited for.
sqe->fd			The 'pid' (or id) being waited for.
sqe->file_index		The 'options' being set.
sqe->addr2		A pointer to siginfo_t, if any, being filled in.

buf_index, add3, and waitid_flags are reserved/unused for now.
waitid_flags will be used for options for this request type. One
interesting use case may be to add multi-shot support, so that the
request stays armed and posts a notification every time a monitored
process state change occurs.

Note that this does not support rusage, on Arnd's recommendation.

See the waitid(2) man page for details on the arguments.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit f31ecf6)
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
mainline inclusion
from mainline-v6.7-rc1
category: performance

Retain top 8bits of uring_cmd flags for kernel internal use, so that we
can move IORING_URING_CMD_POLLED out of uapi header.

Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>
Reviewed-by: Anuj Gupta <anuj20.g@samsung.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit 528ce67)
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
mainline inclusion
from mainline-v6.7-rc1
category: feature

uring_cmd may never complete, such as ublk, in which uring cmd isn't
completed until one new block request is coming from ublk block device.

Add cancelable uring_cmd to provide mechanism to driver for cancelling
pending commands in its own way.

Add API of io_uring_cmd_mark_cancelable() for driver to mark one command as
cancelable, then io_uring will cancel this command in
io_uring_cancel_generic(). ->uring_cmd() callback is reused for canceling
command in driver's way, then driver gets notified with the cancelling
from io_uring.

Add API of io_uring_cmd_get_task() to help driver cancel handler
deal with the canceling.

Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>
Suggested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Conflicts:
	include/linux/io_uring.h
(cherry picked from commit 93b8cc6)
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
@opsiff opsiff force-pushed the linux-6.6.y-2026-06-04-exit branch from b6407e6 to 4edc681 Compare June 16, 2026 14:37
Yeonba0918 and others added 10 commits June 16, 2026 22:56
mainline inclusion
from mainline-v7.1-rc5
category: bugfix

IORING_OP_WAITID stores its result fields in struct io_waitid::info and
later copies them to userspace siginfo. The prep path initializes the
request arguments, but it does not initialize info itself.

If the wait operation completes without reporting a child event, the common
wait code can return without writing wo_info. In that case io_waitid_finish()
still copies iw->info to userspace, exposing stale bytes from the reused
io_kiocb command storage.

Clear the result storage during prep so the io_uring path matches the
regular waitid syscall, which uses a zero-initialized struct waitid_info.

Fixes: f31ecf6 ("io_uring: add IORING_OP_WAITID support")
Cc: stable@vger.kernel.org # 6.7+
Signed-off-by: Heechan Kang <gganji11@naver.com>
Link: https://patch.msgid.link/20260516184709.852814-1-gganji11@naver.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit 93d93f5f8da791e98159795c6ef683f45bd95d13)
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
…eed it

mainline inclusion
from mainline-v6.7-rc1
category: bugfix

The new read multishot method doesn't need to allocate async data ever,
as it doesn't do vectored IO and it must only be used with provided
buffers. While it doesn't have ->prep_async() set, it also sets
->async_size to 0, which is different from any other read/write type we
otherwise support.

If it's used on a file type that isn't pollable, we do try and allocate
this async data, and then try and use that data. But since we passed in
a size of 0 for the data, we get a NULL back on data allocation. We then
proceed to dereference that to copy state, and that obviously won't end
well.

Add a check in io_setup_async_rw() for this condition, and avoid copying
state. Also add a check for whether or not buffer selection is specified
in prep while at it.

Fixes: fc68fcd ("io_uring/rw: add support for IORING_OP_READ_MULTISHOT")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=218101
Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit 0df96fb)
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
mainline inclusion
from mainline-v6.7-rc1
category: bugfix

For addr: this field is not used, since buffer select is forced.
But by forcing it to be zero it leaves open future uses of the field.

len is actually usable, you could imagine that you want to receive
multishot up to a certain length.
However right now this is not how it is implemented, and it seems
safer to force this to be zero.

Fixes: fc68fcd ("io_uring/rw: add support for IORING_OP_READ_MULTISHOT")
Signed-off-by: Dylan Yudaken <dyudaken@gmail.com>
Link: https://lore.kernel.org/r/20231106203909.197089-3-dyudaken@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit 49fbe99)
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
mainline inclusion
from mainline-v6.7-rc1
category: bugfix

When doing a multishot read, the code path reuses the old read
paths. However this breaks an assumption built into those paths,
namely that struct io_rw::len is available for reuse by __io_import_iovec.

For multishot this results in len being set for the first receive
call, and then subsequent calls are clamped to that buffer length
incorrectly.

Instead keep len as zero after recycling buffers, to reuse the full
buffer size of the next selected buffer.

Fixes: fc68fcd ("io_uring/rw: add support for IORING_OP_READ_MULTISHOT")
Signed-off-by: Dylan Yudaken <dyudaken@gmail.com>
Link: https://lore.kernel.org/r/20231106203909.197089-4-dyudaken@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit e537592)
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
mainline inclusion
from mainline-v6.8-rc3
category: bugfix

io_read_mshot() always relies on poll triggering retries, and this works
fine as long as we do a retry per size of the buffer being read. The
buffer size is given by the size of the buffer(s) in the given buffer
group ID.

But if we're reading less than what is available, then we don't always
get to read everything that is available. For example, if the buffers
available are 32 bytes and we have 64 bytes to read, then we'll
correctly read the first 32 bytes and then wait for another poll trigger
before we attempt the next read. This next poll trigger may never
happen, in which case we just sit forever and never make progress, or it
may trigger at some point in the future, and now we're just delivering
the available data much later than we should have.

io_read_mshot() could do retries itself, but that is wasteful as we'll
be going through all of __io_read() again, and most likely in vain.
Rather than do that, bump our poll reference count and have
io_poll_check_events() do one more loop and check with vfs_poll() if we
have more data to read. If we do, io_read_mshot() will get invoked again
directly and we'll read the next chunk.

io_poll_multishot_retry() must only get called from inside
io_poll_issue(), which is our multishot retry handler, as we know we
already "own" the request at this point.

Cc: stable@vger.kernel.org
Link: axboe/liburing#1041
Fixes: fc68fcd ("io_uring/rw: add support for IORING_OP_READ_MULTISHOT")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit c79f52f)
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
mainline inclusion
from mainline-v6.9-rc1
category: bugfix

We can't post CQEs from io-wq with DEFER_TASKRUN set, normal completions
are handled but aux should be explicitly disallowed by opcode handlers.

Cc: stable@vger.kernel.org
Fixes: fc68fcd ("io_uring/rw: add support for IORING_OP_READ_MULTISHOT")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/6fb7cba6f5366da25f4d3eb95273f062309d97fa.1709740837.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit 70581dc)
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
mainline inclusion
from mainline-v6.9-rc1
category: bugfix

If read multishot is being invoked from the poll retry handler, then we
should return IOU_ISSUE_SKIP_COMPLETE rather than -EAGAIN. If not, then
a CQE will be posted with -EAGAIN rather than triggering the retry when
the file is flagged as readable again.

Cc: stable@vger.kernel.org
Reported-by: Sargun Dhillon <sargun@meta.com>
Fixes: fc68fcd ("io_uring/rw: add support for IORING_OP_READ_MULTISHOT")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit 0a3737d)
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
mainline inclusion
from mainline-v6.9-rc1
category: bugfix

We know the request is either being removed, or already in the process of
being removed through task_work, so we can delete it from our waitid list
upfront. This is important for remove all conditions, as we otherwise
will find it multiple times and prevent cancelation progress.

Remove the dead check in cancelation as well for the hash_node being
empty or not. We already have a waitid reference check for ownership,
so we don't need to check the list too.

Cc: stable@vger.kernel.org
Fixes: f31ecf6 ("io_uring: add IORING_OP_WAITID support")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit 2b35b8b)
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
mainline inclusion
from mainline-v6.9-rc3
category: bugfix

Supporting multishot reads requires support for NOWAIT, as the
alternative would be always having io-wq execute the work item whenever
the poll readiness triggered. Any fast file type will have NOWAIT
support (eg it understands both O_NONBLOCK and IOCB_NOWAIT). If the
given file type does not, then simply resort to single shot execution.

Cc: stable@vger.kernel.org
Fixes: fc68fcd ("io_uring/rw: add support for IORING_OP_READ_MULTISHOT")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit 2a975d4)
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
mainline inclusion
from mainline-v6.18-rc1
category: bugfix

For a successful return, always remove our entry from the wait queue
entry list. Previously this was skipped if a cancelation was in
progress, but this can race with another invocation of the wait queue
entry callback.

Cc: stable@vger.kernel.org
Fixes: f31ecf6 ("io_uring: add IORING_OP_WAITID support")
Reported-by: syzbot+b9e83021d9c642a33d8c@syzkaller.appspotmail.com
Tested-by: syzbot+b9e83021d9c642a33d8c@syzkaller.appspotmail.com
Link: https://lore.kernel.org/io-uring/68e5195e.050a0220.256323.001f.GAE@google.com/
Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit 2f8229d)
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants