-
Notifications
You must be signed in to change notification settings - Fork 157
net_mana: downgrading invalid OOB traces from ERROR to WARN #2163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net_mana: downgrading invalid OOB traces from ERROR to WARN #2163
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR downgrading invalid OOB traces from ERROR to WARN in the net_mana networking driver. The change addresses excessive error logging for CQE_TX_INVALID_OOB events, which are common and only affect individual packets rather than indicating critical system issues.
Key Changes
- Modified tracing level for CQE_TX_INVALID_OOB from ERROR to WARN to reduce log noise
- Added new
event_ratelimitedmacro to tracelimit for dynamic log level dispatch - Enhanced testing coverage with new validation tests for both valid packets and error handling
Reviewed Changes
Copilot reviewed 6 out of 7 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| vm/devices/net/net_mana/src/lib.rs | Refactored trace_tx_error to trace_tx with dynamic level, added BackendQueueStats implementation, and enhanced test coverage |
| vm/devices/net/net_backend/src/lib.rs | Added BackendQueueStats trait and queue_stats method to Queue trait |
| vm/devices/net/net_backend/Cargo.toml | Added inspect_counters dependency |
| vm/devices/net/gdma/src/bnic.rs | Added LSO validation with error handling and completion posting |
| vm/devices/net/gdma/Cargo.toml | Added tracelimit dependency |
| support/tracelimit/src/lib.rs | Added event_ratelimited macros for dynamic log level dispatch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to review the tracelimit changes first, don't merge just yet
…t#2163) Not a clean cherry-pick due to changes in tracing WQE handling in net_mana/src/lib.rs Cherry-pick of microsoft#2163
|
Backported in #2170 |
When
net_mana::tx_pollreceives an CQE_TX_INVALID_OOB it is currently logged as an Error. These events occur when the metadata does not match how HW parsed the packet. They can be a result of a Guest Bug, HW Bug, or OpenHCL Bug. They are common, possibly due to Encap, and only effect the specific packet. Unfortunately, their frequency is causing confusion when triaging other issues as it is not clear thatty: 35akaCQE_TX_INVALID_OOBis safe to ignore. Downgrading the tracing to Warning to help folks better understand invalid OOB issues are probably not the cause of whatever they're seeing.