Skip to content

Improve efficiency of resource usage monitor#59

Merged
jmcphers merged 15 commits intomainfrom
feature/resource-usage-lean
Jan 23, 2026
Merged

Improve efficiency of resource usage monitor#59
jmcphers merged 15 commits intomainfrom
feature/resource-usage-lean

Conversation

@jmcphers
Copy link
Contributor

This change significantly improves the efficiency of resource usage monitoring for kernel sessions by reducing how much work the monitor does on each tick.

Previously, the resource monitor called refresh_processes(ProcessesToUpdate::All) every tick, which scans the entire process table on the system. This is wasteful when we only care about monitoring a handful of kernel processes and their children, and can also contribute to runaway resource usage when the process table becomes pathologically large (as happened recently in CI).

The new implementation introduces OS-specific optimizations that directly query only the processes we need. On macOS, we use the proc_listchildpids() API to efficiently enumerate child processes. On Linux, we scan /proc entries but limit ourselves to processes in the same process group as the kernel, and we cache the results so we don't rescan on every tick. Windows uses a similar caching strategy with periodic refreshes.

Unfortunately, sysinfo has a bug on Linux in which it does not compute CPU usage percentages when you ask it to refresh only a subset of processes, so additional work is needed; the change adds a custom CPU tracker in proc_stat.rs that reads CPU times directly from /proc/[pid]/stat and computes usage percentages manually.

The branch also adds an optimization to skip resource collection entirely when no clients are connected to any session, since there's no one listening for the resource update messages anyway. This should reduce unnecessary CPU overhead when Kallichore is running but the user doesn't have a browser/window connected.

Finally, there's a new integration test in resource_usage_test.rs that verifies resource usage data is being populated correctly through both the HTTP API and WebSocket streams.

@jmcphers jmcphers requested a review from samclark2015 January 16, 2026 22:46
Copy link

@samclark2015 samclark2015 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Code changes look great, with special attention paid to the unsafe blocks. One nit in the MacOS process tree implementation. Tested locally on an Ubuntu VM and all looked well

Comment on lines +99 to +117
let count = unsafe { proc_listchildpids(pid as libc::c_int, std::ptr::null_mut(), 0) };

if count <= 0 {
return Vec::new();
}

// Allocate buffer for PIDs
let buffer_size = count as usize;
let mut buffer: Vec<libc::c_int> = vec![0; buffer_size];

// SAFETY: We've allocated a buffer of sufficient size (as returned by the first call).
// proc_listchildpids writes at most buffersize bytes to the buffer.
let result = unsafe {
proc_listchildpids(
pid as libc::c_int,
buffer.as_mut_ptr(),
(buffer_size * size_of::<libc::c_int>()) as libc::c_int,
)
};

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: proc_listchildpids returns the number of bytes required, but is being used as count. This is functional, but overallocates memory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right! fixed in f3b1162

@jmcphers jmcphers merged commit 3700d8d into main Jan 23, 2026
5 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Jan 23, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants