Skip to content

Conversation

@pau-hedgehog
Copy link
Contributor

No description provided.

@pau-hedgehog pau-hedgehog self-assigned this Sep 1, 2025
@pau-hedgehog pau-hedgehog added ci:+release Enable VLAB release tests ci:+hlab Enable hybrid VLAB tests labels Sep 1, 2025
@github-actions
Copy link

github-actions bot commented Sep 1, 2025

Test Results

  8 files  32 suites   2h 4m 48s ⏱️
 20 tests  6 ✅  14 💤 0 ❌
160 runs  48 ✅ 112 💤 0 ❌

Results for commit 95db236.

♻️ This comment has been updated with latest results.

@pau-hedgehog pau-hedgehog marked this pull request as ready for review September 2, 2025 07:17
@pau-hedgehog pau-hedgehog requested a review from a team as a code owner September 2, 2025 07:17
Copy link
Contributor

@edipascale edipascale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, just a couple of small things


// Simple validation that the interface still exists and is configured
for {
out, err := execNodeCmdWOutput(testCtx.hhfabBin, testCtx.workDir, serverName,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we check the lease time to be more precise and make sure that the DHCP renewal actually succeeded? there is already some code doing it in the regular DHCP test that you could reuse

Copy link
Contributor Author

@pau-hedgehog pau-hedgehog Sep 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As usual, you nailed it. I realized that the networkctl option that indeed triggers renewal is not renew, not forcerenew but reconfigure

I'm adding the lease time checking and, as a bonus, for L3VNI checking the 2-step lease

I'm troubleshooting now some slow DHCP leases before resolving this. In case I'm too restrictive in the timeouts in the test or we have performance issues in the server

@pau-hedgehog pau-hedgehog force-pushed the pau/dhcp_renew_rel_test branch 2 times, most recently from d446dcf to 3f07c55 Compare September 3, 2025 16:22
@pau-hedgehog pau-hedgehog removed the ci:+hlab Enable hybrid VLAB tests label Sep 3, 2025
@edipascale
Copy link
Contributor

So I'm a little conflicted about this test the way it currently is, because I do not believe it's testing DHCP renewals anymore. Instead we are reconfiguring the interfaces and getting brand new leases. It's not a bad thing to have additional coverage on this, but it's not what we're saying we are doing, right? For instance, l3vni actual renewals should not have the short->long lease cycle, or at least that's not what I would expect. I think to test renewals you need to set the lease to a reasonably short time via the DHCPOptions and wait long enough for it to expire, then check what you get afterwards. It might be possible to force a renew earlier via networkctl, although this question appears to imply hat it doesn't really work that way, and your experience seems to confirm it.

It's also easy to break this if we ever change the lease time values we use in the DHCP server. This is inevitable to a certain extent, although we could at least set the lease time as a DHCPOption for the "large lease" value instead of using the default and risking it changing.

None of the above means that we should reject the PR, but I think it's worth thinking about what we're doing.

The one thing I think we should change regardless is the code to parse the DHCP lease - instead of duplicating it let's use the existing function or adapt it if it doesn't fit your needs.

@pau-hedgehog
Copy link
Contributor Author

So I'm a little conflicted about this test the way it currently is, because I do not believe it's testing DHCP renewals anymore. Instead we are reconfiguring the interfaces and getting brand new leases. It's not a bad thing to have additional coverage on this, but it's not what we're saying we are doing, right? For instance, l3vni actual renewals should not have the short->long lease cycle, or at least that's not what I would expect. I think to test renewals you need to set the lease to a reasonably short time via the DHCPOptions and wait long enough for it to expire, then check what you get afterwards. It might be possible to force a renew earlier via networkctl, although this question appears to imply hat it doesn't really work that way, and your experience seems to confirm it.

It's also easy to break this if we ever change the lease time values we use in the DHCP server. This is inevitable to a certain extent, although we could at least set the lease time as a DHCPOption for the "large lease" value instead of using the default and risking it changing.

None of the above means that we should reject the PR, but I think it's worth thinking about what we're doing.

The one thing I think we should change regardless is the code to parse the DHCP lease - instead of duplicating it let's use the existing function or adapt it if it doesn't fit your needs.

All the points you raise are 100% valid. Let me iterate once more to address all the concerns

@pau-hedgehog pau-hedgehog marked this pull request as draft September 4, 2025 08:19
@pau-hedgehog pau-hedgehog force-pushed the pau/dhcp_renew_rel_test branch from 95db236 to 46e14bf Compare September 25, 2025 13:19
@pau-hedgehog pau-hedgehog force-pushed the pau/dhcp_renew_rel_test branch from 46e14bf to d23ea05 Compare September 25, 2025 17:48
@Frostman Frostman requested a review from Copilot September 30, 2025 20:54
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a new DHCP renewal test to verify that VPC-attached interfaces properly handle DHCP lease renewals with shorter lease times. The test configures VPC DHCP options with a reduced lease time and monitors the automatic renewal process across servers.

  • Adds comprehensive DHCP renewal testing with concurrent execution support
  • Implements VPC DHCP configuration updates with proper reversion handling
  • Adds lease parsing and validation utilities for monitoring renewal behavior

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

if len(servers) == 0 {
slog.Info("No servers with VPC attachments found, skipping DHCP renewal test")

return true, nil, fmt.Errorf("no servers with VPC attachments found") //nolint:goerr113
Copy link

Copilot AI Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning an error while also returning true (success) is contradictory. Should return false to indicate test failure, or return nil error if skipping is considered success.

Suggested change
return true, nil, fmt.Errorf("no servers with VPC attachments found") //nolint:goerr113
return true, nil, nil

Copilot uses AI. Check for mistakes.
}

if testVPC == nil {
return true, nil, fmt.Errorf("no VPC found for test servers") //nolint:goerr113
Copy link

Copilot AI Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as above - returning true (success) with an error is contradictory. Should return false to indicate test failure.

Suggested change
return true, nil, fmt.Errorf("no VPC found for test servers") //nolint:goerr113
return false, nil, fmt.Errorf("no VPC found for test servers") //nolint:goerr113

Copilot uses AI. Check for mistakes.

slog.Info("DHCP renewal test completed successfully", "servers", len(testServers), "maxDuration", maxDuration)

return false, reverts, nil
Copy link

Copilot AI Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning false (failure) with nil error is contradictory. Should return true for success or provide an error explaining the failure.

Suggested change
return false, reverts, nil
return true, reverts, nil

Copilot uses AI. Check for mistakes.
testSubnet.DHCP.Options = &vpcapi.VPCDHCPOptions{
DNSServers: []string{},
TimeServers: []string{},
InterfaceMTU: 9036,
Copy link

Copilot AI Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Magic number 9036 should be defined as a named constant to improve code readability and maintainability.

Copilot uses AI. Check for mistakes.
@pau-hedgehog pau-hedgehog changed the title chore: add DHCP renew release test Add DHCP renew release test Oct 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci:+release Enable VLAB release tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants