Skip to content

Conversation

brendanobra
Copy link
Contributor

@brendanobra brendanobra commented Oct 8, 2025

What

This PR fixes a critical HashMap memory leak in the Firebolt gateway session management and introduces a comprehensive type-safe identifier system to prevent similar issues in the future.

Key Changes:

  • Critical Bug Fix: Fixed HashMap leak in firebolt_gateway.rs where session cleanup used inconsistent identifier keys
  • Type Safety System: Added comprehensive newtype wrappers in types.rs for SessionId, ConnectionId, AppId, AppInstanceId, RequestId, and DeviceSessionId
  • Validation Framework: Implemented content validation for UUID vs human-readable identifiers
  • Gradual Migration Support: Added conditional compilation support for gradual adoption across the codebase

Why

Memory Leak Issue:
The original code had a critical HashMap memory leak where session registration used session_id as the key, but session cleanup used connection_id, resulting in sessions never being properly removed from memory. This could lead to unbounded memory growth in production environments with high session turnover.

Type Safety Need:
The codebase frequently conflated different identifier types (session IDs, connection IDs, app IDs) using plain String types, making it easy to accidentally use the wrong identifier and creating potential for similar bugs in the future.

How

Immediate Fix:

  • Modified UnregisterSession handler in firebolt_gateway.rs to consistently use cid (connection ID) for all cleanup operations, matching the key used during registration
  • This ensures HashMap operations use consistent keys: register(cid, session)cleanup(cid)

Long-term Prevention:

  • Created newtype wrappers with #[serde(transparent)] for JSON compatibility
  • Implemented validation functions distinguishing UUID formats (for session/connection IDs) from human-readable formats (for app IDs)
  • Added comprehensive trait implementations (Debug, Clone, PartialEq, Eq, Hash, FromStr, TryFrom)
  • Provided gradual migration path via conditional compilation features

Type System Design:

SessionId, ConnectionId, DeviceSessionIdUUID validation (strict format)
AppIdHuman-readable validation (allows spaces, rejects UUIDs to prevent confusion)
AppInstanceId, RequestIdUUID validation for tracking

let session_id_clone = session_id.clone();
tokio::spawn(async move {
if let Err(e) = callback.sender.send(output_clone).await {
error!(

Check failure

Code scanning / CodeQL

Cleartext logging of sensitive information High

This operation writes
session_id_clone
to a log file.
}
});

debug!("Event {} sent to client {}", method, session_id);

Check failure

Code scanning / CodeQL

Cleartext logging of sensitive information High

This operation writes
session_id
to a log file.
// Check if we already have a Thunder subscription for this method
if let Some(sub_state) = method_subscriptions.get_mut(&event_method_key) {
// Thunder subscription exists, just add this client to the fanout list
debug!(

Check failure

Code scanning / CodeQL

Cleartext logging of sensitive information High

This operation writes
session_id_str
to a log file.
if !clients.contains(&session_id_str) {
clients.push(session_id_str.clone());
sub_state.client_count += 1;
debug!(

Check failure

Code scanning / CodeQL

Cleartext logging of sensitive information High

This operation writes
session_id_str
to a log file.
}
} else {
// No Thunder subscription exists, need to create one
debug!(

Check failure

Code scanning / CodeQL

Cleartext logging of sensitive information High

This operation writes
session_id_str
to a log file.
}
} else {
// Client wants to unsubscribe
debug!(

Check failure

Code scanning / CodeQL

Cleartext logging of sensitive information High

This operation writes
session_id_str
to a log file.
Critical Memory Leak Fix:
- Fixed incomplete session cleanup in remove_session_from_events()
- Changed from removing only first EventListener to removing ALL listeners per session
- Used retain() instead of manual remove(index) for comprehensive cleanup
- Added test_remove_session_removes_all_listeners() to prevent regression

Memory leak was causing 1-4KB retention per session with multiple event listeners,
scaling linearly with Firebolt API usage. This was particularly problematic on
embedded aarch64 devices with constrained memory.

Struct Alignment Optimizations:
- Optimized CallContext field ordering to reduce padding (~8 bytes saved per instance)
- Eliminated SessionData wrapper in Session struct for better cache locality
- Reordered PendingSessionInfo fields to minimize internal padding
- Improved memory alignment for frequently allocated structures

Impact:
- Eliminates proportional memory growth with Firebolt API calls
- Reduces per-EventListener memory overhead by 8+ bytes
- Improves cache locality and reduces heap fragmentation
- Critical for embedded deployment on memory-constrained devices

All tests pass (27 session-related tests), no functional regressions.
Validated with clippy and comprehensive test suite.
Copy link

Code Coverage

Package Line Rate Health
core.main.src.bootstrap 0%
core.main.src.broker 73%
core.main.src.state.cap 42%
core.main.src.firebolt.handlers 11%
core.main.src.service.ripple_service 9%
core.sdk.src.processor 6%
core.sdk.src.extn.ffi 0%
core.sdk.src.utils 63%
core.sdk.src.api 46%
core.main.src 0%
core.sdk.src.service.mock_app_gw 0%
core.tdk.src.utils 0%
core.main.src.utils 30%
core.sdk.src.api.firebolt 85%
core.sdk.src.extn 76%
device.thunder_ripple_sdk.src.processors.events 0%
core.main.src.service 33%
core.sdk.src.api.observability 57%
core.main.src.service.apps 26%
core.sdk.src.api.gateway 69%
core.sdk.src.framework 64%
core.main.src.bootstrap.manifest 0%
core.sdk.src.service.mock_app_gw.appgw 0%
device.mock_device.src 55%
device.thunder_ripple_sdk.src.events 4%
core.sdk.src.api.distributor 68%
device.thunder_ripple_sdk.src 13%
core.main.src.broker.thunder 37%
core.sdk.src.extn.client 82%
core.sdk.src 78%
core.tdk.src.gateway 100%
core.main.src.state 37%
core.main.src.service.extn 24%
core.sdk.src.manifest 0%
core.sdk.src.service 41%
device.thunder_ripple_sdk.src.processors 19%
core.sdk.src.api.manifest 74%
core.main.src.broker.test 90%
core.main.src.bootstrap.extn 0%
device.thunder_ripple_sdk.src.bootstrap 0%
core.sdk.src.api.device 76%
core.main.src.firebolt 22%
core.main.src.processor.storage 0%
core.main.src.processor 0%
device.thunder_ripple_sdk.src.client 61%
core.main.src.broker.rules 78%
Summary 50% (22861 / 45807)

Minimum allowed line rate is 48%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant