Skip to content

Conversation

@mkroening
Copy link
Member

@mkroening mkroening commented Oct 21, 2025

Similar to #1994, this makes the internals of the physical and virtual memory allocation private. Note that this will slow down parts of paging right now, since it is not possible to hold a global FrameAlloc lock anymore. This means the lock has to be reacquired again and again until we improve the paging code or the FrameAlloc code.

Depends on #2008.
Closes #1994.

@mkroening mkroening self-assigned this Oct 21, 2025
@mkroening mkroening force-pushed the page_alloc branch 4 times, most recently from 374c53b to d32f419 Compare October 21, 2025 15:11
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark Results

Benchmark Current: 786d4e9 Previous: 80d7cc6 Performance Ratio
startup_benchmark Build Time 109.21 s 111.54 s 0.98
startup_benchmark File Size 0.91 MB 0.89 MB 1.01
Startup Time - 1 core 0.91 s (±0.03 s) 0.90 s (±0.03 s) 1.02
Startup Time - 2 cores 0.91 s (±0.03 s) 0.90 s (±0.04 s) 1.01
Startup Time - 4 cores 0.92 s (±0.03 s) 0.90 s (±0.04 s) 1.02
multithreaded_benchmark Build Time 111.64 s 110.94 s 1.01
multithreaded_benchmark File Size 1.01 MB 1.00 MB 1.01
Multithreaded Pi Efficiency - 2 Threads 90.87 % (±9.00 %) 86.63 % (±8.31 %) 1.05
Multithreaded Pi Efficiency - 4 Threads 44.15 % (±4.21 %) 43.85 % (±3.75 %) 1.01
Multithreaded Pi Efficiency - 8 Threads 25.47 % (±2.03 %) 24.82 % (±1.93 %) 1.03
micro_benchmarks Build Time 255.58 s 261.05 s 0.98
micro_benchmarks File Size 1.01 MB 1.01 MB 1.01
Scheduling time - 1 thread 135.35 ticks (±37.35 ticks) 130.02 ticks (±29.33 ticks) 1.04
Scheduling time - 2 threads 77.00 ticks (±17.59 ticks) 74.47 ticks (±19.53 ticks) 1.03
Micro - Time for syscall (getpid) 8.31 ticks (±4.31 ticks) 6.69 ticks (±3.39 ticks) 1.24
Memcpy speed - (built_in) block size 4096 59053.09 MByte/s (±42425.30 MByte/s) 61589.06 MByte/s (±43639.11 MByte/s) 0.96
Memcpy speed - (built_in) block size 1048576 23561.68 MByte/s (±21470.34 MByte/s) 22767.71 MByte/s (±20609.88 MByte/s) 1.03
Memcpy speed - (built_in) block size 16777216 15328.77 MByte/s (±12685.30 MByte/s) 14645.26 MByte/s (±12130.00 MByte/s) 1.05
Memset speed - (built_in) block size 4096 59528.38 MByte/s (±42735.96 MByte/s) 61902.61 MByte/s (±43834.96 MByte/s) 0.96
Memset speed - (built_in) block size 1048576 24018.20 MByte/s (±21662.29 MByte/s) 23121.96 MByte/s (±20732.33 MByte/s) 1.04
Memset speed - (built_in) block size 16777216 15798.83 MByte/s (±13006.40 MByte/s) 15049.49 MByte/s (±12384.98 MByte/s) 1.05
Memcpy speed - (rust) block size 4096 55765.23 MByte/s (±41139.86 MByte/s) 51062.02 MByte/s (±37732.03 MByte/s) 1.09
Memcpy speed - (rust) block size 1048576 23636.86 MByte/s (±21020.91 MByte/s) 21034.25 MByte/s (±18494.59 MByte/s) 1.12
Memcpy speed - (rust) block size 16777216 15516.08 MByte/s (±13081.15 MByte/s) 13683.17 MByte/s (±11267.50 MByte/s) 1.13
Memset speed - (rust) block size 4096 56038.64 MByte/s (±41276.95 MByte/s) 52107.73 MByte/s (±38544.76 MByte/s) 1.08
Memset speed - (rust) block size 1048576 24002.94 MByte/s (±21148.69 MByte/s) 21343.88 MByte/s (±18636.73 MByte/s) 1.12
Memset speed - (rust) block size 16777216 15838.53 MByte/s (±13244.53 MByte/s) 13980.79 MByte/s (±11428.92 MByte/s) 1.13
alloc_benchmarks Build Time 238.77 s 238.48 s 1.00
alloc_benchmarks File Size 0.97 MB 0.96 MB 1.01
Allocations - Allocation success 100.00 % 100.00 % 1
Allocations - Deallocation success 100.00 % 100.00 % 1
Allocations - Pre-fail Allocations 100.00 % 100.00 % 1
Allocations - Average Allocation time 15990.58 Ticks (±735.75 Ticks) 15620.57 Ticks (±710.75 Ticks) 1.02
Allocations - Average Allocation time (no fail) 15990.58 Ticks (±735.75 Ticks) 15620.57 Ticks (±710.75 Ticks) 1.02
Allocations - Average Deallocation time 1974.50 Ticks (±791.12 Ticks) 2001.83 Ticks (±1012.97 Ticks) 0.99
mutex_benchmark Build Time 243.75 s 243.50 s 1.00
mutex_benchmark File Size 1.02 MB 1.01 MB 1.01
Mutex Stress Test Average Time per Iteration - 1 Threads 26.20 ns (±6.14 ns) 25.56 ns (±5.79 ns) 1.03
Mutex Stress Test Average Time per Iteration - 2 Threads 24.66 ns (±3.27 ns) 24.64 ns (±3.12 ns) 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@mkroening
Copy link
Member Author

@m-mueller678, what do you think about PageAlloc and FrameAlloc instead of going back to free-standing functions? Feel free to review. :)

Copy link
Contributor

@m-mueller678 m-mueller678 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, safe for the few split unsafe blocks. I like the change from free functions to methods.

pub fn init() {
crate::mm::physicalmem::init();
crate::mm::virtualmem::init();
unsafe {
Copy link
Contributor

@m-mueller678 m-mueller678 Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be merged into a single unsafe block

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having these operations in separate blocks was on purpose. See clippy::multiple_unsafe_ops_per_block. Arguably, the situation is far from ideal in the kernel still, but nevertheless, I think this is a step in the right direction.

This function itself should be unsafe too, I just noticed. Let's change that.

Ok(Self(range, PhantomData))
}

pub unsafe fn from_raw(range: PageRange) -> Self {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like there should be an into_raw too for symmetry.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I added it. 👍

unsafe fn deallocate(range: PageRange);
}

pub struct PageRangeBox<A: PageRangeAllocator>(PageRange, PhantomData<A>);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I very much like the idea of PageRangeBox 👍

InterruptTicketMutex::new(FreeList::new());
pub static TOTAL_MEMORY: AtomicUsize = AtomicUsize::new(0);

pub struct FrameAlloc;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of having these as methods rather than free standing functions. Both to enable PageRangeBox and to have something natural to implement x86_64's FrameAllocator on.

mkroening and others added 3 commits October 23, 2025 16:46
This replaces uses of KERNEL_FREE_LIST with PageAlloc.
This replaces uses of PHYSICAL_FREE_LIST with FrameAlloc.

Note that this will slow down parts of x86-64 paging right now, since it is not possible to hold a global `FrameAlloc` lock anymore.
This means the lock has to be reacquired again and again until we improve the paging code or the `FrameAlloc` code.

Co-authored-by: m-mueller678 <[email protected]>
@mkroening
Copy link
Member Author

With #2008 merged, this should now be ready too.

@mkroening mkroening marked this pull request as ready for review October 23, 2025 15:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants