Jacob Labor

Head of Data Infrastructure

Thursday, December 5th, 2024

Safe Systems Programming: Network Buffers in Rust

Network Buffers

In applications dealing with streaming data or network communication, efficiently managing incoming data is essential for maintaining performance and reliability. In order to achieve this we need a network buffer to handle variable-sized data chunks with ease.

This article will implement an UnorderedBuffer that meets the requirements to consume data from a UdpSocket:

let mut socket = std::net::UdpSocket;
let mut buffer = UnorderedBuffer<'_, '_>;

// Receive data into `UnorderedBuffer`
buffer.push_with(MTU, |mut view: PushView<'_, 's>| {
    // Handle fallible operations during push operation
    let written = socket.recv(view.data_mut())?;
    // Handle unknown insertion length
    view.set_length(written as u32);
    Ok(())
});

Recall the requirements for a network buffer:

  • Store network bytes within the same memory allocation
  • Packet length is unknown in advance
  • Packet receive is fallible

In network programming, managing data packets that arrive out of sequence is a frequent challenge, especially with protocols like UDP where delivery order is not guaranteed.

In future articles we will extend the current framework to implement an OrderedBuffer that can effectively buffer and reorder sequenced UDP traffic.

Network Buffers in Linux

In the Linux networking stack, the sk_buff (socket buffer) is the cornerstone data structure for managing network packets. It serves as a container for both the packet data and the metadata required to process and transport the packet across the network layers.

sk_buff is composed from two primary components:

  • Metadata buffer: This structure is a linked list of metadata elements that index into the data buffer.
  • Data buffer: Separate memory region that stores actual packet data managed as a ring buffer.

Metadata buffer represented as a linked list indexing into a ring buffer of packets

In this article we will be implementing the metadata buffer as a ring buffer since we do not have a requirement to be able to re-order packets.

If you are interested in an in-depth explanation about how sk_buff works then the Linux kernel docs are a good starting point:

struct sk_buff - The Linux Kernel documentation

Metadata Buffers

Metadata buffers are used to track package location and length within the data buffer. UnorderedMetadataBuffer is implemented using a ring buffer, Ring, since we require constant-time insertion and removal operations. Ring buffers also have better memory locality than linked lists.

/// Metadata for unordered network packets
pub struct UnorderedMetadata {
    storage_index: u32,
    storage_length: u32,
}

/// [`UnorderedMetadataBuffer`] that stores [`UnorderedMetadata`]
pub struct UnorderedMetadataBuffer<'m> {
    ring: Ring<'m, UnorderedMetadata>,
}

Metadata buffer represented as a ring buffer

If you want to learn more about how ring buffers work and how to implement them, read the previous article in this series:

Systems Design: Zero-Cost Ring Buffers in Rust

Unordered Buffer

An unordered buffer is implemented as a combination of a metadata buffer and data buffer. This section will explore the algorithm used to allocate and deallocate packet data.

pub struct OrderedBuffer<'m, 's> {
    receive_ring: OrderedMetadataBuffer<'m>,
    storage_ring: Ring<'s, u8>,
}

Allocating Packet Data

Allocating packet data is a three-phase procedure:

  1. Ensure that we have enough space in the data buffer to allocate a packet with maximum data size
  2. Allocate packet into data buffer
  3. Allocate metadata entry into metadata buffer

Consider an example with one packet already allocated:

Metadata buffer and data buffer with single metadata and packet already allocated

We initially preallocate an area of the buffer that corresponds to the maximum data size:

Metadata buffer and data buffer with a temporary region pre-allocated encompassed by PushView Metadata buffer and data buffer with temporary region shortened to actual packet data

Commit the allocation to the metadata buffer:

Metadata buffer and data buffer with two packets committed

Implementing UnorderedBuffer::push_with based on this logic:

impl<'s> UnorderedBuffer<'_, 's> {
    /// Push data into receive buffer of [`UnorderedBuffer`]
    pub fn push_with<R, F>(&mut self, max_size: u32, f: F) -> Result<R, std::io::Error>
    where
        F: FnOnce(PushView<'_, 's>) -> Result<R, std::io::Error>,
    {
        let tail_window = self.storage_ring.empty_tail_window();

				// Check if max size will fit in ring tail
        if tail_window >= max_size {
            self.push_with_inner(f)
        } else {
            Err(ErrorKind::WouldBlock.into())
        }
    }
}

We also implement UnorderedBuffer::push_with_inner which allocates metadata conditional on the storage allocation succeeding:

impl<'s> UnorderedBuffer<'_, 's> {
    fn push_with_inner<R, F>(&mut self, f: F) -> Result<R, std::io::Error>
    where
        F: FnOnce(PushView<'_, 's>) -> Result<R, std::io::Error>,
    {
        // Allocate metadata conditional on storage allocation succeeding
        self.receive_ring.push_with(|metadata: &mut UnorderedMetadata| {
            // Allocate message in storage
            self.storage_ring.push_many_with(|view: &mut RingManyMutView<'_, 's, u8>| {
                metadata.set_index(view.index());
                let res = f(PushView { metadata, view })?;
                Ok(res)
            })
        })
    }
}

During the allocation process we make use of PushView. PushView is a temporary view into the buffer that allows the caller to communicate information about packet length back to the buffer.

pub struct PushView<'a, 's> {
    metadata: &'a mut UnorderedMetadata,
    view: RingManyMutView<'a, 's, u8>,
}
impl PushView<'_, '_> {
    /// Set length of written data
    pub fn set_length(&mut self, len: u32) -> &mut Self {
        self.metadata.set_length(len);
        self.view.set_modified(len);
        self
    }
}

Recall the usage example of UnorderedBuffer::push_with:

buffer.push_with(MTU, |mut view: PushView<'_, 's>| {
    let written = socket.recv(view.data_mut())?;
    view.set_length(written as u32);
    Ok(())
});

If you are interested more about views or the implementation of RingManyMutView, read the previous article in this series:

Systems Design: Zero-Cost Ring Buffers in Rust

Deallocating Packet Data

Deallocating packet data is comparatively simpler:

  1. Deallocate metadata element tracking how many bytes need to be deallocated from the data buffer
  2. Deallocate bytes from data buffer

Metadata buffer and data buffer with two packets allocated Metadata buffer and data buffer with the first metadata element deallocated showing that two bytes must be deallocated from the data buffer

We can implement the metadata deallocation using UnorderedMetadataBuffer::pop_with:

impl UnorderedMetadataBuffer<'_> {
    /// Pop next metadata from [`UnorderedMetadataBuffer`]
    pub fn pop_with<R, F>(&mut self, deallocated: &mut u32, f: F) -> Option<R>
    where
        F: FnOnce(&UnorderedMetadata) -> Option<R>,
    {
        self.ring.pop_with(|m| {
            let data = m.data();
            let res = f(data)?;
            *deallocated = data.storage_length + data.storage_padding;
            Some(res)
        })
    }
}

Metadata buffer and data buffer with the first packet deallocated from both metadata and data buffers showing that two bytes were modified

Finally implementing UnorderedBuffer::pop_with:

impl UnorderedBuffer<'_, '_> {
    /// Pop data from receive buffer of [`UnorderedBuffer`]
    pub fn pop_with<R, F>(&mut self, f: F) -> Option<R>
    where
        F: FnOnce(&[u8]) -> Option<R>,
    {
        // Remove metadata conditional on storage removal succeeding
        let deallocated = &mut 0;
        let res = self.receive_ring.pop_with(deallocated, |m| {
            // SAFE: `storage_index` is a valid index into the storage buffer
            let data = self.storage_ring.get_many(m.index(), m.length()).unwrap();
            f(data)
        })?;
        // SAFE: `deallocated` does not exceeed the length of the storage buffer
        self.storage_ring.truncate_many_head(*deallocated).unwrap();
        Some(res)
    }
}

Padding Packet Data

The keen readers may have noticed that we must also handle the case where there is not enough space in the tail of the buffer to allocate a packet.

In this case we must pad the last element in the data buffer such that the buffer will wrap and the next allocation occurs at the head.

Metadata buffer and data buffer with single packet allocated near the tail of the buffer Metadata buffer and data buffer with the first packet padded to fill the tail of the data buffer while the temproary allocation for the next packet is made at the head of the data buffer Metadata buffer and data buffer with the second packet allocated at the head of the buffer

Adding this logic to our implementation of UnorderedBuffer::push_with:

impl<'s> UnorderedBuffer<'_, 's> {
    /// Push data into receive buffer of [`UnorderedBuffer`]
    pub fn push_with<R, F>(&mut self, max_size: u32, f: F) -> Result<R, std::io::Error>
    where
        F: FnOnce(PushView<'_, 's>) -> Result<R, std::io::Error>,
    {
        let tail_window = self.storage_ring.empty_tail_window();
        let head_window =  self.storage_ring.window() - tail_window;

        // Check if max size will fit in ring tail
        if tail_window >= max_size {
            self.push_with_inner(f)
        // Check if max size will fit in ring head
        } else if head_window >= max_size {
            // Pad the tail window
            self.storage_ring.push_many(tail_window)?;
            // Add padding to last metadata element if it exists
            if !self.receive_ring.is_empty() {
                let last_metadata_index = self.receive_ring.last();
                // SAFE: Buffer is non-empty therefore the last element exists
                let last_metadata = self.receive_ring.get_mut(last_metadata_index).unwrap();
                last_metadata.pad(tail_window);
            }
            // Do not undo the tail padding in the case of failure since `max_size` is not
            // expected to change during runtime as it is often hardware dependent.
            let res = self.push_with_inner(f)?;
            Ok(res)
        } else {
            Err(ErrorKind::WouldBlock.into())
        }
    }
}

In order to track the padding that was applied to the packet we must add a third field to UnorderedMetadata:

pub struct UnorderedMetadata {
    storage_index: u32,
    storage_length: u32,
    storage_padding: u32,
}
impl UnorderedMetadata {
    /// Pad [`UnorderedMetadata`] element with empty padding
    pub fn pad(&mut self, padding: u32) -> &mut Self {
        self.storage_padding += padding;
        self
    }
}

Conclusion

In this article, we’ve implemented an UnorderedBuffer in Rust to efficiently manage network data, particularly for UDP network sockets. By utilizing ring buffers for both metadata and packet storage, the design ensures constant-time operations and excellent memory locality.

The design utilizies zero-cost programming practices to create an abstract API that is easy to reason about in the presence of fallible operations.

This article paves the way for our next article where we will implement an OrderedBuffer to reorder sequenced network traffic, further enhancing our capability for network processing.

Jacob Labor

Head of Data Infrastructure

linkedin

Join our Team

If you're at the pinnacle of your game and seek to collaborate with the elite, Anti Capital is your arena. Join us, where the best meet the best.

Recent Articles