Skip to content

Conversation

@zyuiop
Copy link
Contributor

@zyuiop zyuiop commented Jul 8, 2025

This is an old commit on my local branch.

I think this was linked to frequent issues with pages being too big when trying to de-allocate, since the physical free list contains 4KiB frames that may be part of larger 2MiB frames (and therefore not entirely free).

@zyuiop zyuiop force-pushed the feat/uefi-correct-mem-region branch from eed31ea to cbbdfb1 Compare July 8, 2025 16:26
@mkroening mkroening self-assigned this Jul 8, 2025
@mkroening mkroening self-requested a review July 8, 2025 18:03
@zyuiop zyuiop force-pushed the feat/uefi-correct-mem-region branch from cbbdfb1 to 7dcac19 Compare July 28, 2025 10:45
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark Results

Benchmark Current: 7dcac19 Previous: 916c0d1 Performance Ratio
startup_benchmark Build Time 94.16 s 99.64 s 0.95
startup_benchmark File Size 0.81 MB 0.87 MB 0.94
Startup Time - 1 core 0.95 s (±0.03 s) 0.91 s (±0.02 s) 1.04
Startup Time - 2 cores 0.96 s (±0.03 s) 0.91 s (±0.01 s) 1.06
Startup Time - 4 cores 0.97 s (±0.03 s) 0.92 s (±0.02 s) 1.05
multithreaded_benchmark Build Time 92.88 s 100.90 s 0.92
multithreaded_benchmark File Size 0.92 MB 0.97 MB 0.94
Multithreaded Pi Efficiency - 2 Threads 88.39 % (±9.21 %) 72.02 % (±3.89 %) 1.23
Multithreaded Pi Efficiency - 4 Threads 43.37 % (±3.63 %) 42.13 % (±3.21 %) 1.03
Multithreaded Pi Efficiency - 8 Threads 25.45 % (±2.05 %) 20.62 % (±1.67 %) 1.23
micro_benchmarks Build Time 83.87 s 82.66 s 1.01
micro_benchmarks File Size 0.93 MB 0.98 MB 0.94
Scheduling time - 1 thread 50.80 ticks (±0.92 ticks) 50.76 ticks (±1.58 ticks) 1.00
Scheduling time - 2 threads 29.70 ticks (±5.67 ticks) 29.97 ticks (±4.74 ticks) 0.99
Micro - Time for syscall (getpid) 13.75 ticks (±1.55 ticks) 13.62 ticks (±1.90 ticks) 1.01
Memcpy speed - (built_in) block size 4096 87010.10 MByte/s (±60087.12 MByte/s) 87836.97 MByte/s (±60687.62 MByte/s) 0.99
Memcpy speed - (built_in) block size 1048576 44192.25 MByte/s (±30559.80 MByte/s) 44377.25 MByte/s (±30655.67 MByte/s) 1.00
Memcpy speed - (built_in) block size 16777216 29846.53 MByte/s (±24496.33 MByte/s) 30073.21 MByte/s (±24686.15 MByte/s) 0.99
Memset speed - (built_in) block size 4096 87071.83 MByte/s (±60133.25 MByte/s) 87609.46 MByte/s (±60567.75 MByte/s) 0.99
Memset speed - (built_in) block size 1048576 44409.72 MByte/s (±30706.16 MByte/s) 44601.88 MByte/s (±30809.54 MByte/s) 1.00
Memset speed - (built_in) block size 16777216 30599.47 MByte/s (±24922.16 MByte/s) 30861.29 MByte/s (±25146.75 MByte/s) 0.99
Memcpy speed - (rust) block size 4096 80147.08 MByte/s (±55731.73 MByte/s) 78085.94 MByte/s (±54382.07 MByte/s) 1.03
Memcpy speed - (rust) block size 1048576 44184.11 MByte/s (±30545.64 MByte/s) 44176.28 MByte/s (±30556.63 MByte/s) 1.00
Memcpy speed - (rust) block size 16777216 29655.58 MByte/s (±24323.54 MByte/s) 30071.23 MByte/s (±24675.89 MByte/s) 0.99
Memset speed - (rust) block size 4096 80981.65 MByte/s (±56205.91 MByte/s) 78452.24 MByte/s (±54650.22 MByte/s) 1.03
Memset speed - (rust) block size 1048576 44406.32 MByte/s (±30695.39 MByte/s) 44416.03 MByte/s (±30723.13 MByte/s) 1.00
Memset speed - (rust) block size 16777216 30412.25 MByte/s (±24753.07 MByte/s) 30862.88 MByte/s (±25137.48 MByte/s) 0.99
alloc_benchmarks Build Time 79.66 s 79.90 s 1.00
alloc_benchmarks File Size 0.88 MB 0.94 MB 0.94
Allocations - Allocation success 100.00 % 100.00 % 1
Allocations - Deallocation success 69.97 % (±0.20 %) 69.94 % (±0.25 %) 1.00
Allocations - Pre-fail Allocations 100.00 % 100.00 % 1
Allocations - Average Allocation time 9555.20 Ticks (±124.15 Ticks) 10148.44 Ticks (±733.14 Ticks) 0.94
Allocations - Average Allocation time (no fail) 9555.20 Ticks (±124.15 Ticks) 10148.44 Ticks (±733.14 Ticks) 0.94
Allocations - Average Deallocation time 656.06 Ticks (±12.86 Ticks) 662.38 Ticks (±35.19 Ticks) 0.99
mutex_benchmark Build Time 78.86 s 80.36 s 0.98
mutex_benchmark File Size 0.93 MB 0.98 MB 0.94
Mutex Stress Test Average Time per Iteration - 1 Threads 11.36 ns (±0.48 ns) 11.50 ns (±0.50 ns) 0.99
Mutex Stress Test Average Time per Iteration - 2 Threads 12.90 ns (±0.73 ns) 12.88 ns (±0.71 ns) 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@mkroening
Copy link
Member

I think this was linked to frequent issues with pages being too big when trying to de-allocate, since the physical free list contains 4KiB frames that may be part of larger 2MiB frames (and therefore not entirely free).

I don't really understand the issue yet. Why is it an issue to manage free physical memory on 4k-granularity?

The CI failure is due to allocating zero-sized layouts when setting up the heap. While this should be handled gracefully by the free list (tracked in mkroening/free-list#3), we should also do something about that here if we want to continue with this PR.

@zyuiop
Copy link
Contributor Author

zyuiop commented Jul 28, 2025

I think this is part of the larger discussion about the memory layout we inherit from UEFI.
I think we should close this.

For context, I think the issue is that we obtain memory mapped as large pages from firmware, but we mark memory as free in the free list in small pages. This is mostly fine, but if you get a page from the free-list and try to change its flags, it may fail, and you may not be able to safely split the parent page, especially if the free list starts at the end of a large-page that is also used for other stuff in the kernel.

@mkroening
Copy link
Member

Ah, that makes sense. 👍

@mkroening mkroening closed this Aug 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants