Skip to content

Conversation

edipascale
Copy link
Contributor

@edipascale edipascale commented Apr 3, 2025

Generate wirings for externals as part of hhfab vlab gen. Add parameters to determine how many externals and external connections are created, and where the ext. connections are placed (i.e. mclag, eslag or orphan leaves).

All attachments are currently vlan tagged, I didn't want to over complicate things with extra params for untagged.
Also all externals are in the same namespace, and attachments are created for all externals and all connections.

Closes: #521

@edipascale edipascale requested a review from Frostman as a code owner April 3, 2025 10:39
@edipascale edipascale requested review from Copilot and Frostman and removed request for Frostman and Copilot April 3, 2025 10:40
Copilot

This comment was marked as outdated.

@edipascale edipascale force-pushed the ema/external-autogen branch from 60b442d to 1727b00 Compare April 3, 2025 13:35
@edipascale
Copy link
Contributor Author

I've added commits to also address #522, but we have a problem here:

In terms of having the ci passing, it would be easy enough to add explicit values to the vlab-gen for either the collapsed-core (if we keep the defaults as described above) or the spine-leaf cases (if we do not touch the current default). The question is what is the "least undesirable" option @Frostman

Copy link
Member

@Frostman Frostman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably generated conns to external from the the from both switches of the last redundancy group configured. Let's chat next week and discuss what would be the best approach.

@edipascale edipascale force-pushed the ema/vlab-external-testing branch from 50d6799 to 34d6562 Compare April 7, 2025 07:57
@edipascale edipascale force-pushed the ema/external-autogen branch from 1727b00 to d74be05 Compare April 7, 2025 14:00
@edipascale edipascale force-pushed the ema/vlab-external-testing branch 2 times, most recently from 25246f0 to a8c57f4 Compare April 8, 2025 15:05
Base automatically changed from ema/vlab-external-testing to master April 9, 2025 05:39
@edipascale edipascale force-pushed the ema/external-autogen branch from d74be05 to 698fbab Compare April 9, 2025 07:13
@edipascale edipascale force-pushed the ema/external-autogen branch from 698fbab to 76bd390 Compare April 9, 2025 18:02
@edipascale
Copy link
Contributor Author

current status:

  • the changes in release-test to use the first external have to be reverted or at least adapted; we cannot use virtual externals for most scenarios, as multiple external peerings (together with the limitation on virtual switches and ACLs) mean that VPCs will be able to ping each other even when they shouldn't. We can either revert the change, or fetch the annotations for externals and discard non-hardware ones.
  • on top of that, there is https://github.com/githedgehog/internal/issues/145 which makes test-connectivity jobs with the new external peering fail in the ci

@edipascale edipascale force-pushed the ema/external-autogen branch from 76bd390 to 8222ee8 Compare April 15, 2025 13:23
@edipascale
Copy link
Contributor Author

current status:

  • the changes in release-test to use the first external have to be reverted or at least adapted; we cannot use virtual externals for most scenarios, as multiple external peerings (together with the limitation on virtual switches and ACLs) mean that VPCs will be able to ping each other even when they shouldn't. We can either revert the change, or fetch the annotations for externals and discard non-hardware ones.

I went for the second option but it's not perfect. Technically the issue is not with virtual externals, but with virtual switches attached to any type of externals. Because right now we only have hardware externals connected to hardware switches and virtual externals connected to virtual switches, discarding virtual externals is the easy solution... but we probably want to revisit it.

I've already got an answer from BCM on the case I opened requesting more info, which I provided; let's see if we can get unblocked there. In the meanwhile I reduced the autogenerated external connections from 2 to 1, which should work even with this strange issue.

Also I'm seeing several failures in the CI, both here and in other PRs. but they do not appear related to the changes. I'll dig a bit more tomorrow.

@edipascale edipascale force-pushed the ema/external-autogen branch from 8222ee8 to 44d4e94 Compare April 16, 2025 15:26
@Frostman Frostman marked this pull request as draft April 22, 2025 15:13
@edipascale edipascale force-pushed the ema/external-autogen branch 3 times, most recently from a59936d to 358d9e7 Compare May 8, 2025 08:37
@edipascale edipascale force-pushed the ema/external-autogen branch from 358d9e7 to db78dc5 Compare May 8, 2025 12:46
@edipascale
Copy link
Contributor Author

@Frostman this is technically ready to review now. I've removed the default external generation as discussed, and added warnings in case an option with multiple external connections is selected. The only thing missing is a run on a physical environment to check whether we need any additional fix for https://github.com/githedgehog/lab/issues/247, unsure whether we want to wait for that before tagging this as ready

@edipascale edipascale force-pushed the ema/external-autogen branch from db78dc5 to e829871 Compare May 19, 2025 15:02
@edipascale edipascale marked this pull request as ready for review May 20, 2025 08:43
@edipascale edipascale requested review from a team as code owners May 20, 2025 08:43
@edipascale
Copy link
Contributor Author

I could finally check this on physical environments, please see https://github.com/githedgehog/lab/issues/247 for details but this should be ready to be reviewed

@edipascale edipascale force-pushed the ema/external-autogen branch 2 times, most recently from a76751e to d60b201 Compare June 2, 2025 16:30
@edipascale edipascale force-pushed the ema/external-autogen branch from d60b201 to 3837965 Compare June 24, 2025 09:33
@edipascale edipascale force-pushed the ema/external-autogen branch from 3837965 to f1b3c2e Compare July 1, 2025 16:08
Copy link

github-actions bot commented Jul 1, 2025

Test Results

  9 files  ±0  27 suites  ±0   4h 39m 48s ⏱️ +37s
 16 tests ±0  16 ✅ ±0   0 💤 ±0  0 ❌ ±0 
144 runs  ±0  56 ✅ ±0  88 💤 ±0  0 ❌ ±0 

Results for commit f1b3c2e. ± Comparison against base commit 8408af0.

@edipascale edipascale force-pushed the ema/external-autogen branch from f1b3c2e to 967ae34 Compare July 16, 2025 09:07
@edipascale edipascale force-pushed the ema/external-autogen branch 2 times, most recently from 7d0e6a3 to 9045927 Compare July 28, 2025 11:08
@edipascale
Copy link
Contributor Author

@Frostman I've also added an iptable rule to mimic the ACL to drop fabric-to-fabric traffic that we recently added on the ds2000-01, it appears to work:

core@control-1 ~ $ kubectl get externalpeering
NAME                  VPC      EXTERNAL      AGE
vpc-01--external-01   vpc-01   external-01   6m18s
vpc-08--external-02   vpc-08   external-02   6m18s

and

$ ./hhfab-ve vlab test-connectivity --source server-01 --source server-08 --destination server-01 --destination server-08 -v
12:10:12 INF Hedgehog Fabricator version=v0.40.0-107-gc9be7a7c-edp
12:10:12 INF Wiring hydrated successfully mode=if-not-present
12:10:12 INF VLAB config loaded file=vlab/config.yaml
12:10:12 INF Testing server to server and server to external connectivity
12:10:12 WRN All switches are virtual, ignoring IPerf min speed
12:10:12 INF Waiting for switches and gateways ready appliedFor=1m0s timeout=30m0s
12:10:12 INF Expected switches agent=v0.86.0 switches="[leaf-02 leaf-03 leaf-04 leaf-05 spine-01 spine-02 leaf-01]"
12:10:12 INF All switches and gateways are ready took=404.595918ms
12:10:13 INF Discovering server IPs servers=10
12:10:13 INF Found server=server-01 addr=10.0.1.2/24
12:10:13 INF Found server=server-08 addr=10.0.8.2/24
12:10:13 INF Running pings, iperfs and curls servers=10
12:10:13 DBG Checking external connectivity from=server-01 reachable=true
12:10:13 DBG Running curls from=server-01 to=1.0.0.1 count=3
12:10:13 DBG Checking external connectivity from=server-08 reachable=true
12:10:13 DBG Running curls from=server-08 to=1.0.0.1 count=3
12:10:13 DBG Checking connectivity from=server-08 to=server-01 expected=false
12:10:13 DBG Checking connectivity from=server-01 to=server-08 expected=false
12:10:13 DBG Running ping from=server-08 to=10.0.1.2
12:10:13 DBG Running ping from=server-01 to=10.0.8.2
12:10:13 DBG Curl result from=server-01 to=1.0.0.1 expected=true ok=true fail=false err=<nil> out="<html>\r\n<head><title>301 Moved Permanently</title></head>\r\n<body>\r\n<center><h1>301 Moved Permanently</h1></center>\r\n<hr><center>cloudflare</center>\r\n</body>\r\n</html>"
12:10:13 DBG Curl result from=server-08 to=1.0.0.1 expected=true ok=true fail=false err=<nil> out="<html>\r\n<head><title>301 Moved Permanently</title></head>\r\n<body>\r\n<center><h1>301 Moved Permanently</h1></center>\r\n<hr><center>cloudflare</center>\r\n</body>\r\n</html>"
12:10:13 DBG Curl result from=server-01 to=1.0.0.1 expected=true ok=true fail=false err=<nil> out="<html>\r\n<head><title>301 Moved Permanently</title></head>\r\n<body>\r\n<center><h1>301 Moved Permanently</h1></center>\r\n<hr><center>cloudflare</center>\r\n</body>\r\n</html>"
12:10:13 DBG Curl result from=server-08 to=1.0.0.1 expected=true ok=true fail=false err=<nil> out="<html>\r\n<head><title>301 Moved Permanently</title></head>\r\n<body>\r\n<center><h1>301 Moved Permanently</h1></center>\r\n<hr><center>cloudflare</center>\r\n</body>\r\n</html>"
12:10:13 DBG Curl result from=server-01 to=1.0.0.1 expected=true ok=true fail=false err=<nil> out="<html>\r\n<head><title>301 Moved Permanently</title></head>\r\n<body>\r\n<center><h1>301 Moved Permanently</h1></center>\r\n<hr><center>cloudflare</center>\r\n</body>\r\n</html>"
12:10:13 DBG Curl result from=server-08 to=1.0.0.1 expected=true ok=true fail=false err=<nil> out="<html>\r\n<head><title>301 Moved Permanently</title></head>\r\n<body>\r\n<center><h1>301 Moved Permanently</h1></center>\r\n<hr><center>cloudflare</center>\r\n</body>\r\n</html>"
12:10:18 DBG Ping result from=server-08 to=server-01 expected=false ok=false fail=true err="Process exited with status 1" out="PING 10.0.1.2 (10.0.1.2) 56(84) bytes of data.\n\n--- 10.0.1.2 ping statistics ---\n5 packets transmitted, 0 received, 100% packet loss, time 4119ms"
12:10:18 DBG Ping result from=server-01 to=server-08 expected=false ok=false fail=true err="Process exited with status 1" out="PING 10.0.8.2 (10.0.8.2) 56(84) bytes of data.\n\n--- 10.0.8.2 ping statistics ---\n5 packets transmitted, 0 received, 100% packet loss, time 4137ms"
12:10:18 INF All connectivity tested successfully took=6.580151402s

@Frostman Frostman requested review from Frostman and Copilot July 29, 2025 03:45
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds functionality to generate wiring for external connections as part of the VLAB generation process. It introduces parameters to control the number of externals and external connections created, with placement options for MCLAG, ESLAG, or orphan leaves.

  • Adds new CLI flags for configuring external generation parameters
  • Improves external connection validation and error handling in VLAB config
  • Updates BGP configuration template for better multipath support

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
pkg/hhfab/vlabconfig.go Enhanced external connection validation logic with better error handling for mixed hardware/virtual scenarios
pkg/hhfab/vlabbuilder.go Added external generation functionality with new parameters and connection creation methods
pkg/hhfab/vlab_external_butane.tmpl.yaml Updated BGP configuration to enable multipath-relax and improved routing policies
pkg/hhfab/release.go Refactored external detection logic and removed NamedExternal skip flag
cmd/hhfab/main.go Added CLI flags for external configuration parameters
Comments suppressed due to low confidence (2)

pkg/hhfab/vlabbuilder.go:932

  • [nitpick] The comment removal //nolint:unparam suggests this function now returns different error types, but the function signature and implementation appear unchanged. Consider adding back the nolint comment if the function still always returns the same error pattern, or document why it was removed.
func (b *VLABBuilder) createConnection(ctx context.Context, spec wiringapi.ConnectionSpec) (*wiringapi.Connection, error) {

pkg/hhfab/vlabbuilder.go:782

  • [nitpick] The use of double hyphens '--' in the attachment name could be confusing and may cause parsing issues. Consider using a single hyphen or underscore for better readability and consistency.
				extAttachName := fmt.Sprintf("%s--%s", conn.Spec.External.Link.Switch.DeviceName(), ext.Name)

@edipascale edipascale force-pushed the ema/external-autogen branch from c9be7a7 to 822128a Compare July 31, 2025 08:32
@edipascale edipascale force-pushed the ema/external-autogen branch from 822128a to 79c60af Compare August 14, 2025 14:16
@edipascale edipascale force-pushed the ema/external-autogen branch from 79c60af to 59e6912 Compare August 25, 2025 06:49
to speed up testing with the new external spawn options, auto
generate wirings for externals if desired

Signed-off-by: Emanuele Di Pascale <[email protected]>
use the first hardware external if we find any. If not, look for
virtual externals that are attached over hardware connections, i.e.
to hardware switches. The problem being that virtual switches do not
support ACLs and so multiple external peerings will also make those
VPCs peer among each other, breaking the tests.

Also warn about conditions that will lead to some tests being skipped.

Signed-off-by: Emanuele Di Pascale <[email protected]>
if no [mclag/eslag] leaves are present, the corresponding
servers will not be generated, but the log output was
still mentioning the default values, which can be confusing.

Signed-off-by: Emanuele Di Pascale <[email protected]>
* add 'bestpath as-path multipath relax' to have multiple
  ECMP routes to the vpc subnets in case of multiple
  external connections in the same VRF
* remove the 'maximum-paths 1' constraint for the same
  reason
* enable 'soft-reconfiguration inbound' for troubleshooting

Signed-off-by: Emanuele Di Pascale <[email protected]>
the previous check was skipping external connections that were
hardware annotated, but that is incorrect. what we really want
is to perfrom the extra config steps (adding a link towards the
externals and the VRF information) only if the connection is
used to connect to virtual externals. the check is slightly
complicated by the fact that we need to iterate over the attachments,
but now things should work properly.

Signed-off-by: Emanuele Di Pascale <[email protected]>
similar to what we do in the hardware external right now, to
avoid peering VPCs at the external

Signed-off-by: Emanuele Di Pascale <[email protected]>
@edipascale edipascale force-pushed the ema/external-autogen branch from 59e6912 to 2a5ecf3 Compare August 29, 2025 08:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Generate Externals and their connections + attachments for VLAB
2 participants