DoxigAlpha

init_params

A powerful way to setup an io_uring, if you want to tweak linux.io_uring_params such as submission queue thread cpu affinity or thread idle timeout (the kernel and our default is 1 second). params is passed by reference because the kernel needs to modify the parameters. Matches the interface of io_uring_queue_init_params() in liburing.

Function parameters

Parameters

#
entries:u16
p:*linux.io_uring_params

Used to select how the read should be handled.

Types

#
ReadBuffer
Used to select how the read should be handled.
RecvBuffer
Used to select how the recv call should be handled.
BufferGroup
Group of application provided buffers.

A friendly way to setup an io_uring, with default linux.io_uring_params.

Functions

#
init
A friendly way to setup an io_uring, with default linux.io_uring_params.
init_params
A powerful way to setup an io_uring, if you want to tweak linux.io_uring_params such as submission
get_sqe
Returns a pointer to a vacant SQE, or an error if the submission queue is full.
submit
Submits the SQEs acquired via get_sqe() to the kernel.
submit_and_wait
Like submit(), but allows waiting for events as well.
enter
Tell the kernel we have submitted SQEs and/or want to wait for CQEs.
flush_sq
Sync internal state with kernel ring state on the SQ side.
sq_ring_needs_enter
Returns true if we are not using an SQ thread (thus nobody submits but us),
sq_ready
Returns the number of flushed and unflushed SQEs pending in the submission queue.
cq_ready
Returns the number of CQEs in the completion queue, i.e.
copy_cqes
Copies as many CQEs as are ready, and that can fit into the destination `cqes` slice.
copy_cqe
Returns a copy of an I/O completion, waiting for it if necessary, and advancing the CQ ring.
cq_ring_needs_flush
Matches the implementation of cq_ring_needs_flush() in liburing.
cqe_seen
For advanced use cases only that implement custom completion queue methods.
cq_advance
For advanced use cases only that implement custom completion queue methods.
fsync
Queues (but does not submit) an SQE to perform an `fsync(2)`.
nop
Queues (but does not submit) an SQE to perform a no-op.
read
Queues (but does not submit) an SQE to perform a `read(2)` or `preadv(2)` depending on the buffer type.
write
Queues (but does not submit) an SQE to perform a `write(2)`.
splice
Queues (but does not submit) an SQE to perform a `splice(2)`
read_fixed
Queues (but does not submit) an SQE to perform a IORING_OP_READ_FIXED.
writev
Queues (but does not submit) an SQE to perform a `pwritev()`.
write_fixed
Queues (but does not submit) an SQE to perform a IORING_OP_WRITE_FIXED.
accept
Queues (but does not submit) an SQE to perform an `accept4(2)` on a socket.
accept_multishot
Queues an multishot accept on a socket.
accept_direct
Queues an accept using direct (registered) file descriptors.
accept_multishot_direct
Queues an multishot accept using direct (registered) file descriptors.
connect
Queue (but does not submit) an SQE to perform a `connect(2)` on a socket.
epoll_ctl
Queues (but does not submit) an SQE to perform a `epoll_ctl(2)`.
recv
Queues (but does not submit) an SQE to perform a `recv(2)`.
send
Queues (but does not submit) an SQE to perform a `send(2)`.
send_zc
Queues (but does not submit) an SQE to perform an async zerocopy `send(2)`.
send_zc_fixed
Queues (but does not submit) an SQE to perform an async zerocopy `send(2)`.
recvmsg
Queues (but does not submit) an SQE to perform a `recvmsg(2)`.
sendmsg
Queues (but does not submit) an SQE to perform a `sendmsg(2)`.
sendmsg_zc
Queues (but does not submit) an SQE to perform an async zerocopy `sendmsg(2)`.
openat
Queues (but does not submit) an SQE to perform an `openat(2)`.
openat_direct
Queues an openat using direct (registered) file descriptors.
close
Queues (but does not submit) an SQE to perform a `close(2)`.
close_direct
Queues close of registered file descriptor.
timeout
Queues (but does not submit) an SQE to register a timeout operation.
timeout_remove
Queues (but does not submit) an SQE to remove an existing timeout operation.
link_timeout
Queues (but does not submit) an SQE to add a link timeout operation.
poll_add
Queues (but does not submit) an SQE to perform a `poll(2)`.
poll_remove
Queues (but does not submit) an SQE to remove an existing poll operation.
poll_update
Queues (but does not submit) an SQE to update the user data of an existing poll
fallocate
Queues (but does not submit) an SQE to perform an `fallocate(2)`.
statx
Queues (but does not submit) an SQE to perform an `statx(2)`.
cancel
Queues (but does not submit) an SQE to remove an existing operation.
shutdown
Queues (but does not submit) an SQE to perform a `shutdown(2)`.
renameat
Queues (but does not submit) an SQE to perform a `renameat2(2)`.
unlinkat
Queues (but does not submit) an SQE to perform a `unlinkat(2)`.
mkdirat
Queues (but does not submit) an SQE to perform a `mkdirat(2)`.
symlinkat
Queues (but does not submit) an SQE to perform a `symlinkat(2)`.
linkat
Queues (but does not submit) an SQE to perform a `linkat(2)`.
provide_buffers
Queues (but does not submit) an SQE to provide a group of buffers used for commands that read/receive data.
remove_buffers
Queues (but does not submit) an SQE to remove a group of provided buffers.
waitid
Queues (but does not submit) an SQE to perform a `waitid(2)`.
register_files
Registers an array of file descriptors.
register_files_update
Updates registered file descriptors.
register_files_sparse
Registers an empty (-1) file table of `nr_files` number of file descriptors.
register_eventfd
Registers the file descriptor for an eventfd that will be notified of completion events on
register_eventfd_async
Registers the file descriptor for an eventfd that will be notified of completion events on
unregister_eventfd
Unregister the registered eventfd file descriptor.
register_buffers
Registers an array of buffers for use with `read_fixed` and `write_fixed`.
unregister_buffers
Unregister the registered buffers.
get_probe
Returns a io_uring_probe which is used to probe the capabilities of the
unregister_files
Unregisters all registered file descriptors previously associated with the ring.
socket
Prepares a socket creation request.
socket_direct
Prepares a socket creation request for registered file at index `file_index`.
socket_direct_alloc
Prepares a socket creation request for registered file, index chosen by kernel (file index alloc).
bind
Queues (but does not submit) an SQE to perform an `bind(2)` on a socket.
listen
Queues (but does not submit) an SQE to perform an `listen(2)` on a socket.
cmd_sock
Prepares an cmd request for a socket.
setsockopt
Prepares set socket option for the optname argument, at the protocol
getsockopt
Prepares get socket option to retrieve the value for the option specified by
setup_buf_ring
Registers a shared buffer ring to be used with provided buffers.
buf_ring_init
Initialises `br` so that it is ready to be used.
buf_ring_mask
Calculates the appropriate size mask for a buffer ring.
buf_ring_add
Assigns `buffer` with the `br` buffer ring.
buf_ring_advance
Make `count` new buffers visible to the kernel.

Source

Implementation

#
pub fn init_params(entries: u16, p: *linux.io_uring_params) !IoUring {
    if (entries == 0) return error.EntriesZero;
    if (!std.math.isPowerOfTwo(entries)) return error.EntriesNotPowerOfTwo;

    assert(p.sq_entries == 0);
    assert(p.cq_entries == 0 or p.flags & linux.IORING_SETUP_CQSIZE != 0);
    assert(p.features == 0);
    assert(p.wq_fd == 0 or p.flags & linux.IORING_SETUP_ATTACH_WQ != 0);
    assert(p.resv[0] == 0);
    assert(p.resv[1] == 0);
    assert(p.resv[2] == 0);

    const res = linux.io_uring_setup(entries, p);
    switch (linux.E.init(res)) {
        .SUCCESS => {},
        .FAULT => return error.ParamsOutsideAccessibleAddressSpace,
        // The resv array contains non-zero data, p.flags contains an unsupported flag,
        // entries out of bounds, IORING_SETUP_SQ_AFF was specified without IORING_SETUP_SQPOLL,
        // or IORING_SETUP_CQSIZE was specified but linux.io_uring_params.cq_entries was invalid:
        .INVAL => return error.ArgumentsInvalid,
        .MFILE => return error.ProcessFdQuotaExceeded,
        .NFILE => return error.SystemFdQuotaExceeded,
        .NOMEM => return error.SystemResources,
        // IORING_SETUP_SQPOLL was specified but effective user ID lacks sufficient privileges,
        // or a container seccomp policy prohibits io_uring syscalls:
        .PERM => return error.PermissionDenied,
        .NOSYS => return error.SystemOutdated,
        else => |errno| return posix.unexpectedErrno(errno),
    }
    const fd = @as(posix.fd_t, @intCast(res));
    assert(fd >= 0);
    errdefer posix.close(fd);

    // Kernel versions 5.4 and up use only one mmap() for the submission and completion queues.
    // This is not an optional feature for us... if the kernel does it, we have to do it.
    // The thinking on this by the kernel developers was that both the submission and the
    // completion queue rings have sizes just over a power of two, but the submission queue ring
    // is significantly smaller with u32 slots. By bundling both in a single mmap, the kernel
    // gets the submission queue ring for free.
    // See https://patchwork.kernel.org/patch/11115257 for the kernel patch.
    // We do not support the double mmap() done before 5.4, because we want to keep the
    // init/deinit mmap paths simple and because io_uring has had many bug fixes even since 5.4.
    if ((p.features & linux.IORING_FEAT_SINGLE_MMAP) == 0) {
        return error.SystemOutdated;
    }

    // Check that the kernel has actually set params and that "impossible is nothing".
    assert(p.sq_entries != 0);
    assert(p.cq_entries != 0);
    assert(p.cq_entries >= p.sq_entries);

    // From here on, we only need to read from params, so pass `p` by value as immutable.
    // The completion queue shares the mmap with the submission queue, so pass `sq` there too.
    var sq = try SubmissionQueue.init(fd, p.*);
    errdefer sq.deinit();
    var cq = try CompletionQueue.init(fd, p.*, sq);
    errdefer cq.deinit();

    // Check that our starting state is as we expect.
    assert(sq.head.* == 0);
    assert(sq.tail.* == 0);
    assert(sq.mask == p.sq_entries - 1);
    // Allow flags.* to be non-zero, since the kernel may set IORING_SQ_NEED_WAKEUP at any time.
    assert(sq.dropped.* == 0);
    assert(sq.array.len == p.sq_entries);
    assert(sq.sqes.len == p.sq_entries);
    assert(sq.sqe_head == 0);
    assert(sq.sqe_tail == 0);

    assert(cq.head.* == 0);
    assert(cq.tail.* == 0);
    assert(cq.mask == p.cq_entries - 1);
    assert(cq.overflow.* == 0);
    assert(cq.cqes.len == p.cq_entries);

    return IoUring{
        .fd = fd,
        .sq = sq,
        .cq = cq,
        .flags = p.flags,
        .features = p.features,
    };
}