init_params
A powerful way to setup an io_uring, if you want to tweak linux.io_uring_params such as submission
queue thread cpu affinity or thread idle timeout (the kernel and our default is 1 second).
params is passed by reference because the kernel needs to modify the parameters.
Matches the interface of io_uring_queue_init_params() in liburing.
Function parameters
Parameters
- entries:u16
- p:*linux.io_uring_params
Used to select how the read should be handled.
Types
- ReadBuffer
- Used to select how the read should be handled.
- RecvBuffer
- Used to select how the recv call should be handled.
- BufferGroup
- Group of application provided buffers.
A friendly way to setup an io_uring, with default linux.io_uring_params.
Functions
- init
- A friendly way to setup an io_uring, with default linux.io_uring_params.
- init_params
- A powerful way to setup an io_uring, if you want to tweak linux.io_uring_params such as submission
- get_sqe
- Returns a pointer to a vacant SQE, or an error if the submission queue is full.
- submit
- Submits the SQEs acquired via get_sqe() to the kernel.
- submit_and_wait
- Like submit(), but allows waiting for events as well.
- enter
- Tell the kernel we have submitted SQEs and/or want to wait for CQEs.
- flush_sq
- Sync internal state with kernel ring state on the SQ side.
- sq_ring_needs_enter
- Returns true if we are not using an SQ thread (thus nobody submits but us),
- sq_ready
- Returns the number of flushed and unflushed SQEs pending in the submission queue.
- cq_ready
- Returns the number of CQEs in the completion queue, i.e.
- copy_cqes
- Copies as many CQEs as are ready, and that can fit into the destination `cqes` slice.
- copy_cqe
- Returns a copy of an I/O completion, waiting for it if necessary, and advancing the CQ ring.
- cq_ring_needs_flush
- Matches the implementation of cq_ring_needs_flush() in liburing.
- cqe_seen
- For advanced use cases only that implement custom completion queue methods.
- cq_advance
- For advanced use cases only that implement custom completion queue methods.
- fsync
- Queues (but does not submit) an SQE to perform an `fsync(2)`.
- nop
- Queues (but does not submit) an SQE to perform a no-op.
- read
- Queues (but does not submit) an SQE to perform a `read(2)` or `preadv(2)` depending on the buffer type.
- write
- Queues (but does not submit) an SQE to perform a `write(2)`.
- splice
- Queues (but does not submit) an SQE to perform a `splice(2)`
- read_fixed
- Queues (but does not submit) an SQE to perform a IORING_OP_READ_FIXED.
- writev
- Queues (but does not submit) an SQE to perform a `pwritev()`.
- write_fixed
- Queues (but does not submit) an SQE to perform a IORING_OP_WRITE_FIXED.
- accept
- Queues (but does not submit) an SQE to perform an `accept4(2)` on a socket.
- accept_multishot
- Queues an multishot accept on a socket.
- accept_direct
- Queues an accept using direct (registered) file descriptors.
- accept_multishot_direct
- Queues an multishot accept using direct (registered) file descriptors.
- connect
- Queue (but does not submit) an SQE to perform a `connect(2)` on a socket.
- epoll_ctl
- Queues (but does not submit) an SQE to perform a `epoll_ctl(2)`.
- recv
- Queues (but does not submit) an SQE to perform a `recv(2)`.
- send
- Queues (but does not submit) an SQE to perform a `send(2)`.
- send_zc
- Queues (but does not submit) an SQE to perform an async zerocopy `send(2)`.
- send_zc_fixed
- Queues (but does not submit) an SQE to perform an async zerocopy `send(2)`.
- recvmsg
- Queues (but does not submit) an SQE to perform a `recvmsg(2)`.
- sendmsg
- Queues (but does not submit) an SQE to perform a `sendmsg(2)`.
- sendmsg_zc
- Queues (but does not submit) an SQE to perform an async zerocopy `sendmsg(2)`.
- openat
- Queues (but does not submit) an SQE to perform an `openat(2)`.
- openat_direct
- Queues an openat using direct (registered) file descriptors.
- close
- Queues (but does not submit) an SQE to perform a `close(2)`.
- close_direct
- Queues close of registered file descriptor.
- timeout
- Queues (but does not submit) an SQE to register a timeout operation.
- timeout_remove
- Queues (but does not submit) an SQE to remove an existing timeout operation.
- link_timeout
- Queues (but does not submit) an SQE to add a link timeout operation.
- poll_add
- Queues (but does not submit) an SQE to perform a `poll(2)`.
- poll_remove
- Queues (but does not submit) an SQE to remove an existing poll operation.
- poll_update
- Queues (but does not submit) an SQE to update the user data of an existing poll
- fallocate
- Queues (but does not submit) an SQE to perform an `fallocate(2)`.
- statx
- Queues (but does not submit) an SQE to perform an `statx(2)`.
- cancel
- Queues (but does not submit) an SQE to remove an existing operation.
- shutdown
- Queues (but does not submit) an SQE to perform a `shutdown(2)`.
- renameat
- Queues (but does not submit) an SQE to perform a `renameat2(2)`.
- unlinkat
- Queues (but does not submit) an SQE to perform a `unlinkat(2)`.
- mkdirat
- Queues (but does not submit) an SQE to perform a `mkdirat(2)`.
- symlinkat
- Queues (but does not submit) an SQE to perform a `symlinkat(2)`.
- linkat
- Queues (but does not submit) an SQE to perform a `linkat(2)`.
- provide_buffers
- Queues (but does not submit) an SQE to provide a group of buffers used for commands that read/receive data.
- remove_buffers
- Queues (but does not submit) an SQE to remove a group of provided buffers.
- waitid
- Queues (but does not submit) an SQE to perform a `waitid(2)`.
- register_files
- Registers an array of file descriptors.
- register_files_update
- Updates registered file descriptors.
- register_files_sparse
- Registers an empty (-1) file table of `nr_files` number of file descriptors.
- register_eventfd
- Registers the file descriptor for an eventfd that will be notified of completion events on
- register_eventfd_async
- Registers the file descriptor for an eventfd that will be notified of completion events on
- unregister_eventfd
- Unregister the registered eventfd file descriptor.
- register_buffers
- Registers an array of buffers for use with `read_fixed` and `write_fixed`.
- unregister_buffers
- Unregister the registered buffers.
- get_probe
- Returns a io_uring_probe which is used to probe the capabilities of the
- unregister_files
- Unregisters all registered file descriptors previously associated with the ring.
- socket
- Prepares a socket creation request.
- socket_direct
- Prepares a socket creation request for registered file at index `file_index`.
- socket_direct_alloc
- Prepares a socket creation request for registered file, index chosen by kernel (file index alloc).
- bind
- Queues (but does not submit) an SQE to perform an `bind(2)` on a socket.
- listen
- Queues (but does not submit) an SQE to perform an `listen(2)` on a socket.
- cmd_sock
- Prepares an cmd request for a socket.
- setsockopt
- Prepares set socket option for the optname argument, at the protocol
- getsockopt
- Prepares get socket option to retrieve the value for the option specified by
- setup_buf_ring
- Registers a shared buffer ring to be used with provided buffers.
- buf_ring_init
- Initialises `br` so that it is ready to be used.
- buf_ring_mask
- Calculates the appropriate size mask for a buffer ring.
- buf_ring_add
- Assigns `buffer` with the `br` buffer ring.
- buf_ring_advance
- Make `count` new buffers visible to the kernel.
Source
Implementation
pub fn init_params(entries: u16, p: *linux.io_uring_params) !IoUring {
if (entries == 0) return error.EntriesZero;
if (!std.math.isPowerOfTwo(entries)) return error.EntriesNotPowerOfTwo;
assert(p.sq_entries == 0);
assert(p.cq_entries == 0 or p.flags & linux.IORING_SETUP_CQSIZE != 0);
assert(p.features == 0);
assert(p.wq_fd == 0 or p.flags & linux.IORING_SETUP_ATTACH_WQ != 0);
assert(p.resv[0] == 0);
assert(p.resv[1] == 0);
assert(p.resv[2] == 0);
const res = linux.io_uring_setup(entries, p);
switch (linux.E.init(res)) {
.SUCCESS => {},
.FAULT => return error.ParamsOutsideAccessibleAddressSpace,
// The resv array contains non-zero data, p.flags contains an unsupported flag,
// entries out of bounds, IORING_SETUP_SQ_AFF was specified without IORING_SETUP_SQPOLL,
// or IORING_SETUP_CQSIZE was specified but linux.io_uring_params.cq_entries was invalid:
.INVAL => return error.ArgumentsInvalid,
.MFILE => return error.ProcessFdQuotaExceeded,
.NFILE => return error.SystemFdQuotaExceeded,
.NOMEM => return error.SystemResources,
// IORING_SETUP_SQPOLL was specified but effective user ID lacks sufficient privileges,
// or a container seccomp policy prohibits io_uring syscalls:
.PERM => return error.PermissionDenied,
.NOSYS => return error.SystemOutdated,
else => |errno| return posix.unexpectedErrno(errno),
}
const fd = @as(posix.fd_t, @intCast(res));
assert(fd >= 0);
errdefer posix.close(fd);
// Kernel versions 5.4 and up use only one mmap() for the submission and completion queues.
// This is not an optional feature for us... if the kernel does it, we have to do it.
// The thinking on this by the kernel developers was that both the submission and the
// completion queue rings have sizes just over a power of two, but the submission queue ring
// is significantly smaller with u32 slots. By bundling both in a single mmap, the kernel
// gets the submission queue ring for free.
// See https://patchwork.kernel.org/patch/11115257 for the kernel patch.
// We do not support the double mmap() done before 5.4, because we want to keep the
// init/deinit mmap paths simple and because io_uring has had many bug fixes even since 5.4.
if ((p.features & linux.IORING_FEAT_SINGLE_MMAP) == 0) {
return error.SystemOutdated;
}
// Check that the kernel has actually set params and that "impossible is nothing".
assert(p.sq_entries != 0);
assert(p.cq_entries != 0);
assert(p.cq_entries >= p.sq_entries);
// From here on, we only need to read from params, so pass `p` by value as immutable.
// The completion queue shares the mmap with the submission queue, so pass `sq` there too.
var sq = try SubmissionQueue.init(fd, p.*);
errdefer sq.deinit();
var cq = try CompletionQueue.init(fd, p.*, sq);
errdefer cq.deinit();
// Check that our starting state is as we expect.
assert(sq.head.* == 0);
assert(sq.tail.* == 0);
assert(sq.mask == p.sq_entries - 1);
// Allow flags.* to be non-zero, since the kernel may set IORING_SQ_NEED_WAKEUP at any time.
assert(sq.dropped.* == 0);
assert(sq.array.len == p.sq_entries);
assert(sq.sqes.len == p.sq_entries);
assert(sq.sqe_head == 0);
assert(sq.sqe_tail == 0);
assert(cq.head.* == 0);
assert(cq.tail.* == 0);
assert(cq.mask == p.cq_entries - 1);
assert(cq.overflow.* == 0);
assert(cq.cqes.len == p.cq_entries);
return IoUring{
.fd = fd,
.sq = sq,
.cq = cq,
.flags = p.flags,
.features = p.features,
};
}