Track tasks memory usage using a custom allocator #7647

spoutn1k · 2025-09-25T05:51:53Z

spoutn1k
Sep 25, 2025

Hi all, I have been trying to benchmark the memory usage of a server. I tried using multiple custom allocators, and the results seem to not align with the memory usage reported by systemd.

The following benchmark highlights the behaviour I am seeing:

#![allow(unused_variables)]

use axum::{
    Router,
    body::Body,
    extract::Query,
    http::{Response, StatusCode, header::CONTENT_TYPE},
};
use divan::AllocProfiler;
use tokio::task::JoinSet;

#[global_allocator]
static ALLOC: AllocProfiler = AllocProfiler::system();

#[divan::bench]
pub fn test_vec() {
    let test: Vec<u8> = Vec::with_capacity(1024 * 1024 * 1024);
}

#[divan::bench]
pub fn test_vec_async() {
    tokio::runtime::Builder::new_multi_thread()
        .enable_time()
        .enable_io()
        .build()
        .expect("Failed to build tokio runtime")
        .block_on(async move {
            let test: Vec<u8> = Vec::with_capacity(1024 * 1024 * 1024);
        })
}

#[divan::bench]
pub fn test_vec_async_join() {
    tokio::runtime::Builder::new_multi_thread()
        .enable_time()
        .enable_io()
        .build()
        .expect("Failed to build tokio runtime")
        .block_on(async move {
            let mut join_set = JoinSet::new();
            for _ in 1..10 {
                join_set.spawn(async move {
                    let test: Vec<u8> = Vec::with_capacity(1024 * 1024 * 1024);
                });
            }

            join_set.join_all().await;
        })
}

#[divan::bench]
pub fn test_vec_axum() {
    let rt = tokio::runtime::Builder::new_multi_thread()
        .enable_time()
        .enable_io()
        .build()
        .expect("Failed to build tokio runtime");

    pub async fn scrape(Query(_): Query<()>) -> Response<Body> {
        let test: Vec<u8> = Vec::with_capacity(1024 * 1024 * 1024);

        Response::builder()
            .status(StatusCode::OK)
            .header(
                CONTENT_TYPE,
                "application/openmetrics-text; version=1.0.0; charset=utf-8",
            )
            .body(Body::from("hello"))
            .expect("Failed to build response")
    }

    let (kill_s, kill_r) = tokio::sync::oneshot::channel();

    let server = async move {
        let listener = tokio::net::TcpListener::bind(("0.0.0.0", 12345))
            .await
            .expect("Failed to bind to port 12345");

        axum::serve(listener, {
            Router::new().route("/", axum::routing::get(scrape))
        })
        .with_graceful_shutdown(async move {
            kill_r.await.ok();
        })
        .await
        .expect("Failed to serve app.");
    };

    let request = async move {
        reqwest::get("http://localhost:12345/")
            .await
            .expect("Request failed")
            .text()
            .await
            .expect("Parsing response failed");

        kill_s.send(()).ok();
    };

    rt.block_on(async move {
        let mut join_set = JoinSet::new();
        join_set.spawn(request);
        join_set.spawn(server);
        join_set.join_all().await
    });
}

fn main() {
    divan::main()
}

The output stops seeing allocations as soon as I spawn tasks:

Timer precision: 41 ns
allocate                fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ test_vec             866.5 ns      │ 34.07 µs      │ 908.5 ns      │ 1.304 µs      │ 100     │ 100
│                       max alloc:    │               │               │               │         │
│                         1           │ 1             │ 1             │ 1             │         │
│                         1.073 GB    │ 1.073 GB      │ 1.073 GB      │ 1.073 GB      │         │
│                       alloc:        │               │               │               │         │
│                         1           │ 1             │ 1             │ 1             │         │
│                         1.073 GB    │ 1.073 GB      │ 1.073 GB      │ 1.073 GB      │         │
│                       dealloc:      │               │               │               │         │
│                         1           │ 1             │ 1             │ 1             │         │
│                         1.073 GB    │ 1.073 GB      │ 1.073 GB      │ 1.073 GB      │         │
├─ test_vec_async       199.4 µs      │ 926.3 µs      │ 303 µs        │ 309.9 µs      │ 100     │ 100
│                       max alloc:    │               │               │               │         │
│                         163         │ 202           │ 163           │ 163.4         │         │
│                         1.073 GB    │ 1.073 GB      │ 1.073 GB      │ 1.073 GB      │         │
│                       alloc:        │               │               │               │         │
│                         168         │ 207           │ 168           │ 168.4         │         │
│                         1.073 GB    │ 1.073 GB      │ 1.073 GB      │ 1.073 GB      │         │
│                       dealloc:      │               │               │               │         │
│                         48          │ 48            │ 48            │ 48            │         │
│                         1.073 GB    │ 1.073 GB      │ 1.073 GB      │ 1.073 GB      │         │
│                       grow:         │               │               │               │         │
│                         16          │ 15            │ 15.5          │ 15.49         │         │
│                         300 B       │ 172 B         │ 236 B         │ 236 B         │         │
├─ test_vec_async_join  222.4 µs      │ 509.4 µs      │ 275.7 µs      │ 293.2 µs      │ 100     │ 100
│                       max alloc:    │               │               │               │         │
│                         182         │ 181           │ 181           │ 181           │         │
│                         79.94 KB    │ 79.75 KB      │ 79.75 KB      │ 79.81 KB      │         │
│                       alloc:        │               │               │               │         │
│                         187         │ 186           │ 186           │ 186           │         │
│                         80.33 KB    │ 80.26 KB      │ 80.26 KB      │ 80.27 KB      │         │
│                       dealloc:      │               │               │               │         │
│                         66          │ 66            │ 66            │ 66            │         │
│                         5.12 KB     │ 4.992 KB      │ 4.992 KB      │ 5.045 KB      │         │
│                       grow:         │               │               │               │         │
│                         16          │ 15            │ 15            │ 15.4          │         │
│                         300 B       │ 172 B         │ 172 B         │ 225.1 B       │         │
╰─ test_vec_axum        744.8 µs      │ 5.959 ms      │ 851.9 µs      │ 924.5 µs      │ 100     │ 100
                        max alloc:    │               │               │               │         │
                          167         │ 167           │ 167           │ 167           │         │
                          79.74 KB    │ 79.62 KB      │ 79.68 KB      │ 79.67 KB      │         │
                        alloc:        │               │               │               │         │
                          173         │ 173           │ 173           │ 173           │         │
                          80.19 KB    │ 80.19 KB      │ 80.19 KB      │ 80.19 KB      │         │
                        dealloc:      │               │               │               │         │
                          55          │ 55            │ 55            │ 55            │         │
                          5.109 KB    │ 4.981 KB      │ 5.045 KB      │ 5.036 KB      │         │
                        grow:         │               │               │               │         │
                          16          │ 15            │ 15.5          │ 15.43         │         │
                          300 B       │ 172 B         │ 236 B         │ 227.6 B       │         │

This also happened when generating flamehgraphs with other allocators. Is this expected ? I imagine there is some kind of thread situation in the background. Is there a way to specify the allocator to use in tokio ?

Answered by ADD-SP

Sep 25, 2025

tokio always use the allocator specified by the #[global_allocator]

I tried a very simple toy profiler, which correctly reports the memory usage.

#![allow(unused_variables)]

use tokio::task::JoinSet;
use std::alloc::{GlobalAlloc, System, Layout};
use std::sync::atomic::{AtomicUsize, Ordering::*};

static ALLOCATED: AtomicUsize = AtomicUsize::new(0);
static DEALLOCATED: AtomicUsize = AtomicUsize::new(0);

struct MyAllocator;

#[global_allocator]
static GLOBAL: MyAllocator = MyAllocator;

unsafe impl GlobalAlloc for MyAllocator {
    unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
        ALLOCATED.fetch_add(layout.size(), Relaxed);
        unsafe { System.alloc(layout) }
    }

    un…

View full answer

ADD-SP · 2025-09-25T12:26:47Z

ADD-SP
Sep 25, 2025
Collaborator

tokio always use the allocator specified by the #[global_allocator]

I tried a very simple toy profiler, which correctly reports the memory usage.

#![allow(unused_variables)]

use tokio::task::JoinSet;
use std::alloc::{GlobalAlloc, System, Layout};
use std::sync::atomic::{AtomicUsize, Ordering::*};

static ALLOCATED: AtomicUsize = AtomicUsize::new(0);
static DEALLOCATED: AtomicUsize = AtomicUsize::new(0);

struct MyAllocator;

#[global_allocator]
static GLOBAL: MyAllocator = MyAllocator;

unsafe impl GlobalAlloc for MyAllocator {
    unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
        ALLOCATED.fetch_add(layout.size(), Relaxed);
        unsafe { System.alloc(layout) }
    }

    unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {
        DEALLOCATED.fetch_add(layout.size(), Relaxed);
        unsafe { System.dealloc(ptr, layout) }
    }
}

pub fn test_vec_async_join() {
    tokio::runtime::Builder::new_multi_thread()
        .enable_time()
        .enable_io()
        .build()
        .expect("Failed to build tokio runtime")
        .block_on(async move {
            let mut join_set = JoinSet::new();
            for _ in 1..10 {
                join_set.spawn(async move {
                    let test: Vec<u8> = Vec::with_capacity(1024 * 1024 * 1024);
                    // avoid dead code elimination
                    assert!(test.capacity() >= 1024 * 1024 * 1024);
                });
            }
            join_set.join_all().await;
        })
}


fn main() {
    test_vec_async_join();
    println!("Allocated: {} GiB", ALLOCATED.load(Relaxed) / 1024 / 1024 / 1024);
    println!("Deallocated: {} GiB", DEALLOCATED.load(Relaxed) / 1024 / 1024 / 1024);
}

1 reply

spoutn1k Sep 25, 2025
Author

So it looks like this is on the tracing allocators. Thank you for your help !

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Track tasks memory usage using a custom allocator #7647

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Track tasks memory usage using a custom allocator #7647

Uh oh!

spoutn1k Sep 25, 2025

Replies: 1 comment · 1 reply

Uh oh!

Uh oh!

ADD-SP Sep 25, 2025 Collaborator

Uh oh!

spoutn1k Sep 25, 2025 Author

spoutn1k
Sep 25, 2025

Replies: 1 comment 1 reply

ADD-SP
Sep 25, 2025
Collaborator

spoutn1k Sep 25, 2025
Author