Skip to content

Conversation

@erfrimod
Copy link
Contributor

When net_mana::tx_poll receives an CQE_TX_INVALID_OOB it is currently logged as an Error. These events occur when the metadata does not match how HW parsed the packet. They can be a result of a Guest Bug, HW Bug, or OpenHCL Bug. They are common, possibly due to Encap, and only effect the specific packet. Unfortunately, their frequency is causing confusion when triaging other issues as it is not clear that ty: 35 aka CQE_TX_INVALID_OOB is safe to ignore. Downgrading the tracing to Warning to help folks better understand invalid OOB issues are probably not the cause of whatever they're seeing.

  • Adding to tracelimit macros to create event_ratelimited where callers may supply the Level.
  • Modifying net_mana to trace CQE_TX_GDMA_ERR at ERROR and CQE_TX_INVALID_OOB at WARN
  • Adding two new tests, one for valid packets and one for an invalid LSO segment count (1)
    • Refactoring test helpers in net_mana to support the new tests
  • Modifying bnic to enforce LSO segment counts
  • Adding BackendQueueStats to net_backent so that new tests in net_mana can check Counters: tx_packets, tx_errors, rx_packets, rx_errors

@erfrimod erfrimod requested a review from a team as a code owner October 13, 2025 23:06
@Copilot Copilot AI review requested due to automatic review settings October 13, 2025 23:06
@erfrimod erfrimod requested a review from a team as a code owner October 13, 2025 23:06
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR downgrading invalid OOB traces from ERROR to WARN in the net_mana networking driver. The change addresses excessive error logging for CQE_TX_INVALID_OOB events, which are common and only affect individual packets rather than indicating critical system issues.

Key Changes

  • Modified tracing level for CQE_TX_INVALID_OOB from ERROR to WARN to reduce log noise
  • Added new event_ratelimited macro to tracelimit for dynamic log level dispatch
  • Enhanced testing coverage with new validation tests for both valid packets and error handling

Reviewed Changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
vm/devices/net/net_mana/src/lib.rs Refactored trace_tx_error to trace_tx with dynamic level, added BackendQueueStats implementation, and enhanced test coverage
vm/devices/net/net_backend/src/lib.rs Added BackendQueueStats trait and queue_stats method to Queue trait
vm/devices/net/net_backend/Cargo.toml Added inspect_counters dependency
vm/devices/net/gdma/src/bnic.rs Added LSO validation with error handling and completion posting
vm/devices/net/gdma/Cargo.toml Added tracelimit dependency
support/tracelimit/src/lib.rs Added event_ratelimited macros for dynamic log level dispatch

Copy link
Contributor

@smalis-msft smalis-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to review the tracelimit changes first, don't merge just yet

@erfrimod erfrimod added the backport_2505 Change should be backported to the release/2505 branch label Oct 14, 2025
@erfrimod erfrimod merged commit 7aa1add into microsoft:main Oct 14, 2025
71 of 73 checks passed
erfrimod added a commit to erfrimod/openvmm_fork2 that referenced this pull request Oct 14, 2025
…t#2163)

Not a clean cherry-pick due to changes in tracing WQE handling in net_mana/src/lib.rs

Cherry-pick of microsoft#2163
erfrimod added a commit that referenced this pull request Oct 15, 2025
…2170)

Not a clean cherry-pick due to changes in tracing WQE handling in
net_mana/src/lib.rs

Backport of #2163
@benhillis
Copy link
Member

Backported in #2170

@benhillis benhillis added backported_2505 PR that has been backported to release/2505 and removed backport_2505 Change should be backported to the release/2505 branch labels Oct 16, 2025
@erfrimod erfrimod deleted the erfrimod/netmana-queit-invalid-oob branch October 22, 2025 22:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backported_2505 PR that has been backported to release/2505

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants