Skip to content

Initial implementation for switch port settings #426

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

internet-diglett
Copy link

No description provided.

@sudomateo
Copy link
Collaborator

sudomateo commented Apr 18, 2025

@internet-diglett and I spent a few hours today updating the implementation to fix some issues with serialization and typing as well as chatting about a path forward. There's currently an issue with the nested addresses.addresses.address_lot.id attribute being computed that makes Terraform want to replace the resource every run.

EDIT: See #418 (comment) for more details but schema.*NestedAttribute is what we should be using, not schema.Block.

I'm not exactly sure why that is but perhaps this resource is a good candidate for nested blocks described in #418 rather than the nested arrays of objects.

I tested the following schema locally and did not run into the aforementioned issue.

Blocks: map[string]schema.Block{
    "link": schema.ListNestedBlock{
        NestedObject: schema.NestedBlockObject{
            Attributes: map[string]schema.Attribute{
                "name": schema.StringAttribute{
                    Required: true,
                },
            },
            Blocks: map[string]schema.Block{
                "address": schema.ListNestedBlock{
                    NestedObject: schema.NestedBlockObject{
                        Attributes: map[string]schema.Attribute{
                            "addr": schema.StringAttribute{
                                Required: true,
                            },
                        },
                        Blocks: map[string]schema.Block{
                            "address_lot": schema.SingleNestedBlock{
                                Attributes: map[string]schema.Attribute{
                                    "id": schema.StringAttribute{
                                        Computed: true,
                                    },
                                    "name": schema.StringAttribute{
                                        Required: true,
                                    },
                                },
                            },
                        },
                    },
                },
            },
        },
    },
},

Here's what the configuration ended up looking like.

resource "oxide_switch_port_configuration" "example" {
  name        = "foo"
  description = "Managed by Terraform."

  link {
    name = "foo"
    address {
      addr = "foo"
      address_lot {
        name = "bar"
      }
    }
  }
}

And no more delete and create anew!

> terraform plan
oxide_switch_port_configuration.example: Refreshing state... [id=973beb3d-e658-40c3-8dba-5b3d10ef42c0]

No changes. Your infrastructure matches the configuration.

Terraform has compared your real infrastructure against your configuration and found no differences, so no changes are needed.

@sudomateo
Copy link
Collaborator

sudomateo commented Apr 18, 2025

I did some more testing and if you change the schema for both addresses from schema.SetNestedAttribute to schema.ListNestedAttribute then there is no longer an issue with plans wanting to replace the nested attributes. I'm not sure what's actually happening under the hood to change that but it works.

I still maintain that we should push for the more modern schema.Block usage but that can be discussed later.

EDIT: See #418 (comment) for more details but schema.*NestedAttribute is what we should be using, not schema.Block.

internet-diglett added a commit to oxidecomputer/omicron that referenced this pull request May 31, 2025
This PR aims to improve the usability of the switch port settings API by
returning data of similar shape and content as what users provide when
the create the switch port settings.

Issues resolved:
* Before this PR, you only need one call to create a complete switch
port settings object, but you have to perform several lookups to get the
complete switch port configuration. We were already returning the object
ids of the nested objects, so it makes a lot of sense to just return the
actual object (especially since the user supplied the entire object when
they configured it)
* The `HashMap<String_of_link_name, T>` pattern also did not give the
expected behavior in our go client generation. Changing this to a
`Vec<T_with_link_name_inside>` matches what the user provides during
creation and also resolves this behavior.
* Some of our `Vec<T>` fields can be empty but must be present. In the
Go client, an empty array is omitted from the json object. We needed to
add a `default` annotation to these fields so Serde handles this
situation correctly.

## Related
- oxidecomputer/oxide.go#278
- oxidecomputer/terraform-provider-oxide#426

Closes #5780
hawkw pushed a commit to oxidecomputer/omicron that referenced this pull request Jun 2, 2025
This PR aims to improve the usability of the switch port settings API by
returning data of similar shape and content as what users provide when
the create the switch port settings.

Issues resolved:
* Before this PR, you only need one call to create a complete switch
port settings object, but you have to perform several lookups to get the
complete switch port configuration. We were already returning the object
ids of the nested objects, so it makes a lot of sense to just return the
actual object (especially since the user supplied the entire object when
they configured it)
* The `HashMap<String_of_link_name, T>` pattern also did not give the
expected behavior in our go client generation. Changing this to a
`Vec<T_with_link_name_inside>` matches what the user provides during
creation and also resolves this behavior.
* Some of our `Vec<T>` fields can be empty but must be present. In the
Go client, an empty array is omitted from the json object. We needed to
add a `default` annotation to these fields so Serde handles this
situation correctly.

## Related
- oxidecomputer/oxide.go#278
- oxidecomputer/terraform-provider-oxide#426

Closes #5780
@internet-diglett internet-diglett marked this pull request as ready for review June 19, 2025 03:54
@internet-diglett internet-diglett changed the title WIP: initial implementation for switch port settings Initial implementation for switch port settings Jun 20, 2025
@sudomateo sudomateo force-pushed the switch-port-settings-resource branch from fbde127 to bf25088 Compare June 27, 2025 19:09
@sudomateo
Copy link
Collaborator

Quick update here. I haven't forgotten about this. A few things came up this week that I was handling first. Will review this next week.

internet-diglett and others added 3 commits July 10, 2025 15:11
This patch made the following changes.

* Renamed the resource from `oxide_switch_port_configuration` to
`oxide_switch_port_settings` to be consistent with the upstream API.

* Changed the schema to reflect the schema from the
`networking_switch_port_settings_create` API. Further
work is required to account for the schema returned by the
`networking_switch_port_settings_view` API.

* Created new functions to convert to and from Terraform models to Oxide
models.
@sudomateo sudomateo force-pushed the switch-port-settings-resource branch from bf25088 to 416afe0 Compare July 11, 2025 01:18
@sudomateo
Copy link
Collaborator

I made the following changes in 416afe0.

  • Renamed the resource from oxide_switch_port_configuration to oxide_switch_port_settings to be consistent with the upstream API.

  • Changed the schema to reflect the schema from the networking_switch_port_settings_create API. Further work is required to account for the schema returned by the networking_switch_port_settings_view API (e.g., nested IDs).

I following Running Omicron (Simulated) and used it to test these changes locally with the following configuration.

resource "oxide_switch_port_settings" "example" {
  name        = "example"
  description = "example"

  addresses = [
    {
      link_name = "phy0"
      addresses = [
        {
          address     = "10.85.0.30/24"
          address_lot = "example"
        },
      ]
    },
  ]

  links = [
    {
      autoneg = false
      fec     = "none"
      lldp = {
        enabled = false
      }
      mtu       = 1500
      link_name = "phy0"
      speed     = "speed1_g"
    },
  ]

  port_config = {
    geometry = "qsfp28x1"
  }

  routes = [
    {
      link_name = "phy0"
      routes = [
        {
          dst = "0.0.0.0/0"
          gw  = "10.85.0.1"
        },
        {
          dst = "1.1.1.1/32"
          gw  = "10.85.0.1"
        },
      ]
    },
  ]
}

However, I received the following error.

oxide_switch_port_settings.example: Creating...
╷
│ Error: Error creating switch port settings
│
│   with oxide_switch_port_settings.example,
│   on main.tf line 16, in resource "oxide_switch_port_settings" "example":
│   16: resource "oxide_switch_port_settings" "example" {
│
│ API error: POST http://127.0.0.1:12220/v1/system/networking/switch-port-settings
│ ----------- RESPONSE -----------
│ Status: 400
│ Message: unable to parse JSON body: addresses[0].addresses[0].address: data did not match any variant of untagged enum IpNet at
│ line 1 column 41
│ RequestID: 321a037c-4ccd-4a86-8c94-581e382a23f9
│ ------- RESPONSE HEADERS -------
│ Content-Type: [application/json]
│ X-Request-Id: [321a037c-4ccd-4a86-8c94-581e382a23f9]
│ Content-Length: [210]
│ Date: [Thu, 10 Jul 2025 22:55:25 GMT]
│
╵

It's likely that a simulated Omicron isn't sufficient to test these changes since that environment does not run the Rack Setup Service (RSS) to initialize the networking settings. I'll pair with @internet-diglett tomorrow since they have environments to test against.

The sheer size of the networking_switch_port_settings_create and networking_switch_port_settings_view APIs make this resource a behemoth to work with. Additionally, the APIs are asymmetric in the sense that the networking_switch_port_settings_create API takes a request body in a nested format (e.g., addresses[].addresses[].address) but the response body and the networking_switch_port_settings_view API return a non-nested format (e.g., addresses[].address). This, coupled with the link_name being adjacent to the array of resources it's managing on create but being nested within each array element on read, means that the Terraform resource needs to perform additional logic to ensure consistency that's not required by other APIs.

@karencfv
Copy link
Collaborator

Oof, looks like more fun with enums and Go 🙃

Looking at that error, I think the API isn't accepting your POST request, it's a 400 so I don't think this would work anywhere, sadly.

unable to parse JSON body: addresses[0].addresses[0].address: data did not match any variant of untagged enum IpNet

I think the API isn't recognising the way you are passing the "10.85.0.30/24" as an IpNet variant. I might be wrong but perhaps you may be missing a ValueString() when constructing NetworkingSwitchPortSettingsCreateParams?

Copy link
Collaborator

@karencfv karencfv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked at the entire PR but took a quick look at where the error you were encountering might be occurring. I hope this helps!

@sudomateo
Copy link
Collaborator

Thanks Karen! I thought I covered all those oxide.IpNet cases but I was mistaken.

With the changes in 3d7d79a I'm now getting the following error.

oxide_switch_port_settings.example: Creating...
╷
│ Error: Error creating switch port settings
│
│   with oxide_switch_port_settings.example,
│   on main.tf line 16, in resource "oxide_switch_port_settings" "example":
│   16: resource "oxide_switch_port_settings" "example" {
│
│ API error: POST http://127.0.0.1:12220/v1/system/networking/switch-port-settings
│ ----------- RESPONSE -----------
│ Status: 500 Internal
│ Message: Internal Server Error
│ RequestID: 12b35f5a-2255-4244-a5aa-f9c84032fb81
│ ------- RESPONSE HEADERS -------
│ Content-Type: [application/json]
│ X-Request-Id: [12b35f5a-2255-4244-a5aa-f9c84032fb81]
│ Content-Length: [124]
│ Date: [Fri, 11 Jul 2025 13:57:48 GMT]
│
╵

Which the logs tell me is an issue with the serialization for local_pref. Will debug.

{"msg":"request completed","v":0,"name":"omicron-dev","level":30,"time":"2025-07-11T13:57:48.670343242Z","hostname":"ms-ox01","pid":3113,"uri":"/v1/system/networking/switch-port-settings","method":"POST","req_id":"12b35f5a-2255-4244-a5aa-f9c84032fb81","remote_addr":"127.0.0.1:41766","local_addr":"127.0.0.1:12220","component":"dropshot_external","name":"e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c","error_message_external":"Internal Server Error","error_message_internal":"Unknown diesel error creating SwitchPortSettings called \"example\": Error deserializing field 'local_pref': Received more than 2 bytes while decoding an i16. Was an Integer expression accidentally marked as SmallInt?","latency_us":133766,"response_code":500}

@sudomateo
Copy link
Collaborator

I was thinking the local_pref name referred to the attribute under bgp_peers. It doesn't. It actually refers to the rib_priority attribute under routes. The rib_priority attribute used to be called local_pref until it was renamed in oxidecomputer/omicron#6693. The error that's being reported is coming from diesel, which still uses the local_pref name as the name of the column in the database.

In an effort to eliminate uncertainty from the Go SDK and Terraform, I tested against the API directly with the following request body. Note rib_priority is set to 0.

{
  "addresses": [
    {
      "addresses": [
        {
          "address": "10.85.0.30/24",
          "address_lot": "example",
          "vlan_id": 0
        }
      ],
      "link_name": "phy0"
    }
  ],
  "bgp_peers": [],
  "description": "example",
  "groups": [],
  "links": [
    {
      "autoneg": false,
      "fec": "none",
      "link_name": "phy0",
      "lldp": {
        "enabled": false
      },
      "mtu": 1500,
      "speed": "speed1_g",
      "tx_eq": {}
    }
  ],
  "name": "example",
  "port_config": {
    "geometry": "qsfp28x1"
  },
  "routes": [
    {
      "link_name": "phy0",
      "routes": [
        {
          "dst": "0.0.0.0/0",
          "gw": "10.85.0.1",
          "rib_priority": 0
        }
      ]
    }
  ]
}

I still received an internal server error.

$ curl --request POST --header "Authorization: Bearer $OXIDE_TOKEN" --header "Content-Type: application/json" $OXIDE_HOST/v1/system/networking/switch-port-settings --data @/tmp/settings.json
{
  "request_id": "6fdee63a-3630-404b-8d37-335ffcedc38b",
  "error_code": "Internal",
  "message": "Internal Server Error"
}

With the same error as before.

{"msg":"request completed","v":0,"name":"omicron-dev","level":30,"time":"2025-07-11T15:39:23.279978169Z","hostname":"ms-ox01","pid":43902,"uri":"/v1/system/networking/switch-port-settings","method":"POST","req_id":"6fdee63a-3630-404b-8d37-335ffcedc38b","remote_addr":"127.0.0.1:48642","local_addr":"127.0.0.1:12220","component":"dropshot_external","name":"e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c","error_message_external":"Internal Server Error","error_message_internal":"Unknown diesel error creating SwitchPortSettings called \"example\": Error deserializing field 'local_pref': Received more than 2 bytes while decoding an i16. Was an Integer expression accidentally marked as SmallInt?","latency_us":129375,"response_code":500}

I then removed rib_priority from the request body.

diff --git a/tmp/settings.json b/tmp/settings.json
index de538ff54..ce48cdd43 100644
--- a/tmp/settings.json
+++ b/tmp/settings.json
@@ -38,7 +38,6 @@
         {
           "dst": "0.0.0.0/0",
           "gw": "10.85.0.1",
-          "rib_priority": 0
         }
       ]
     }

Now the error is AddressLot not found, which I believe is expected since my simulated setup does not have a true networking stack configured.

$ curl --request POST --header "Authorization: Bearer $OXIDE_TOKEN" --header "Content-Type: application/json" $OXIDE_HOST/v1/system/networking/switch-port-settings --data @/tmp/settings.json
{
  "request_id": "bb5dfa11-f570-4022-af91-4ec3085440d9",
  "error_code": "InvalidRequest",
  "message": "AddressLot not found"
}

I then pushed aaa884d to allow setting rib_priority and vid to nil and Terraform is now reporting the same error when rib_priority is omitted from the configuration.

╷
│ Error: Error creating switch port settings
│
│   with oxide_switch_port_settings.example,
│   on main.tf line 16, in resource "oxide_switch_port_settings" "example":
│   16: resource "oxide_switch_port_settings" "example" {
│
│ API error: POST http://127.0.0.1:12220/v1/system/networking/switch-port-settings
│ ----------- RESPONSE -----------
│ Status: 400 InvalidRequest
│ Message: AddressLot not found
│ RequestID: 67fcc057-acce-4a41-ae2e-b051f90e1e28
│ ------- RESPONSE HEADERS -------
│ Content-Length: [129]
│ Date: [Fri, 11 Jul 2025 15:50:36 GMT]
│ Content-Type: [application/json]
│ X-Request-Id: [67fcc057-acce-4a41-ae2e-b051f90e1e28]
│
╵

If I'm interpreting all this correctly then somewhere in Omicron rib_priority's Option<u8> is becoming larger than diesel's Int2.

sudomateo added a commit to oxidecomputer/omicron that referenced this pull request Jul 12, 2025
I renamed the `local_pref` column to `rib_priority` to complete the
rename in #6693.

I also changed the type of the renamed column from `INT8` to
`INT2`, clamping existing values to `0` or `255`. This was missed
in #6693 and led to the
following error when reading the value from the database.

```
{"msg":"request completed","v":0,"name":"omicron-dev","level":30,"time":"2025-07-11T13:57:48.670343242Z","hostname":"ms-ox01","pid":3113,"uri":"/v1/system/networking/switch-port-settings","method":"POST","req_id":"12b35f5a-2255-4244-a5aa-f9c84032fb81","remote_addr":"127.0.0.1:41766","local_addr":"127.0.0.1:12220","component":"dropshot_external","name":"e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c","error_message_external":"Internal Server Error","error_message_internal":"Unknown diesel error creating SwitchPortSettings called \"example\": Error deserializing field 'local_pref': Received more than 2 bytes while decoding an i16. Was an Integer expression accidentally marked as SmallInt?","latency_us":133766,"response_code":500}
```

The error occurred because the Rust type was `SqlU8` and the database
type was `INT8`. Reads would fail because `INT8` columns could not be
read into `SqlU8` types without loss of precision.

This was caught in
oxidecomputer/terraform-provider-oxide#426
when implementing a Terraform provider resource for switch
port settings. It's likely that this has been broken since
#6693 and any user that
attempted to set `rib_priority` in their Rack Setup Service (RSS) would
have encountered this error. However, `rib_priority` is an uncommon
configuration option and none of our customer's RSS configurations show
this as being set.

Given this information, it seems safe to assume that no customer has the
`rib_priority` column set so the clamping logic implemented here should
work well for customer installations.
@sudomateo
Copy link
Collaborator

I submitted oxidecomputer/omicron#8587 to address the issue with rib_priority in Omicron. With the changes in aaa884d I'm not blocked and can proceed with testing this against an environment with proper networking configuration. I'll sync with @internet-diglett to do this on Monday.

sudomateo added a commit to oxidecomputer/omicron that referenced this pull request Jul 12, 2025
I renamed the `local_pref` column to `rib_priority` to complete the
rename in #6693.

I also changed the type of the renamed column from `INT8` to
`INT2`, clamping existing values to `0` or `255`. This was missed
in #6693 and led to the
following error when reading the value from the database.

```
{"msg":"request completed","v":0,"name":"omicron-dev","level":30,"time":"2025-07-11T13:57:48.670343242Z","hostname":"ms-ox01","pid":3113,"uri":"/v1/system/networking/switch-port-settings","method":"POST","req_id":"12b35f5a-2255-4244-a5aa-f9c84032fb81","remote_addr":"127.0.0.1:41766","local_addr":"127.0.0.1:12220","component":"dropshot_external","name":"e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c","error_message_external":"Internal Server Error","error_message_internal":"Unknown diesel error creating SwitchPortSettings called \"example\": Error deserializing field 'local_pref': Received more than 2 bytes while decoding an i16. Was an Integer expression accidentally marked as SmallInt?","latency_us":133766,"response_code":500}
```

The error occurred because the Rust type was `SqlU8` and the database
type was `INT8`. Reads would fail because `INT8` columns could not be
read into `SqlU8` types without loss of precision.

This was caught in
oxidecomputer/terraform-provider-oxide#426
when implementing a Terraform provider resource for switch
port settings. It's likely that this has been broken since
#6693 and any user that
attempted to set `rib_priority` in their Rack Setup Service (RSS) would
have encountered this error. However, `rib_priority` is an uncommon
configuration option and none of our customer's RSS configurations show
this as being set.

Given this information, it seems safe to assume that no customer has the
`rib_priority` column set so the clamping logic implemented here should
work well for customer installations.
@sudomateo
Copy link
Collaborator

It would be really nice to have a resource for this! (Not as part of this PR of course 😅 )

What would we need? A resource for /v1/system/networking/address-lot and a datasource for /v1/system/networking/address-lot/{ID}/blocks? Anything else? We should open up some follow up issues for this

Haha. Yeah definitely not in this pull request. We'll need the following.

@sudomateo
Copy link
Collaborator

Much better! The initial resource is created.

> terraform apply
...
oxide_switch_port_settings.example: Creating...
oxide_switch_port_settings.example: Creation complete after 0s [id=28189c9a-28fd-421d-92dd-ac2d99f41b80]

Apply complete! Resources: 1 added, 0 changed, 0 destroyed.

With a no-op plan immediately afterwards.

> terraform apply
oxide_switch_port_settings.example: Refreshing state... [id=28189c9a-28fd-421d-92dd-ac2d99f41b80]

No changes. Your infrastructure matches the configuration.

Terraform has compared your real infrastructure against your configuration and found no differences, so no changes are needed.

Apply complete! Resources: 0 added, 0 changed, 0 destroyed.

@sudomateo
Copy link
Collaborator

sudomateo commented Jul 16, 2025

I just have the following work left before this is ready for merge.

  • Update the acceptance tests.
  • (Optional) Update the schema to return compute attributes that are only present in the create response body (e.g., addresses[].address_lot_id). This would require either changing from SetNestedAttribute to ListNestedAttribute or adding a new computed only attribute that holds this data due to limitations with Terraform's SetNestedAttribute. This API is asymmetric where the create request body takes addresses[].addresses[].address_lot_id and the response body returns addresses[].address_lot_id. Due to that asymmetry I'm reluctant to switch to ListNestedAttribute and would rather separate the "create" schema from the "read-only" schema. Going to give it some thought though as this can also be addressed in a subsequent pull request.

@sudomateo
Copy link
Collaborator

This is ready for review! I'm going to defer on the ListNestedAttribute suggestion in #426 (comment) due to time constraints unless @internet-diglett informs me that it's a hard blocker.

I ran the acceptance tests against a simulated Omicron deployment and they pass. The output is below. I also added a lengthy comment to the TestAccCloudResourceSwitchPortSettings_full function to document how to safely run the acceptance tests.

> TEST_ACC_NAME=TestAccCloudResourceSwitchPortSettings_full OXIDE_TEST_ADDRESS_LOT_ID=a7d5849e-c017-4af8-a245-fb3d48fc9912 make testacc
-> Running terraform acceptance tests
=== RUN   TestAccCloudResourceSwitchPortSettings_full
=== PAUSE TestAccCloudResourceSwitchPortSettings_full
=== CONT  TestAccCloudResourceSwitchPortSettings_full
--- PASS: TestAccCloudResourceSwitchPortSettings_full (1.36s)
PASS
ok      github.com/oxidecomputer/terraform-provider-oxide/internal/provider     1.366s

Copy link
Collaborator

@karencfv karencfv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! Thanks for working on this. It looks like a mammoth undertaking 😅 . Let's add some documentation for the resource and an entry to the changelog as well :)

This one is worth having an example in the examples directory too I think 🤔

@sudomateo
Copy link
Collaborator

Will get to the rest of the open discussion items tomorrow. Thank you for the review!

@sudomateo sudomateo self-assigned this Jul 17, 2025
sudomateo added a commit to oxidecomputer/omicron that referenced this pull request Jul 17, 2025
I renamed the `local_pref` column to `rib_priority` to complete the
rename in #6693.

I also changed the type of the renamed column from `INT8` to
`INT2`, clamping existing values to `0` or `255`. This was missed
in #6693 and led to the
following error when reading the value from the database.

```
{"msg":"request completed","v":0,"name":"omicron-dev","level":30,"time":"2025-07-11T13:57:48.670343242Z","hostname":"ms-ox01","pid":3113,"uri":"/v1/system/networking/switch-port-settings","method":"POST","req_id":"12b35f5a-2255-4244-a5aa-f9c84032fb81","remote_addr":"127.0.0.1:41766","local_addr":"127.0.0.1:12220","component":"dropshot_external","name":"e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c","error_message_external":"Internal Server Error","error_message_internal":"Unknown diesel error creating SwitchPortSettings called \"example\": Error deserializing field 'local_pref': Received more than 2 bytes while decoding an i16. Was an Integer expression accidentally marked as SmallInt?","latency_us":133766,"response_code":500}
```

The error occurred because the Rust type was `SqlU8` and the database
type was `INT8`. Reads would fail because `INT8` columns could not be
read into `SqlU8` types without loss of precision.

This was caught in
oxidecomputer/terraform-provider-oxide#426
when implementing a Terraform provider resource for switch
port settings. It's likely that this has been broken since
#6693 and any user that
attempted to set `rib_priority` in their Rack Setup Service (RSS) would
have encountered this error. However, `rib_priority` is an uncommon
configuration option and none of our customer's RSS configurations show
this as being set.

Given this information, it seems safe to assume that no customer has the
`rib_priority` column set so the clamping logic implemented here should
work well for customer installations.
sudomateo added a commit to oxidecomputer/omicron that referenced this pull request Jul 18, 2025
I renamed the `local_pref` column to `rib_priority` to complete the
rename in #6693.

I also changed the type of the renamed column from `INT8` to `INT2`,
clamping existing values to `0` or `255`. This was missed in
#6693 and led to the
following error when reading the value from the database.

```
{"msg":"request completed","v":0,"name":"omicron-dev","level":30,"time":"2025-07-11T13:57:48.670343242Z","hostname":"ms-ox01","pid":3113,"uri":"/v1/system/networking/switch-port-settings","method":"POST","req_id":"12b35f5a-2255-4244-a5aa-f9c84032fb81","remote_addr":"127.0.0.1:41766","local_addr":"127.0.0.1:12220","component":"dropshot_external","name":"e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c","error_message_external":"Internal Server Error","error_message_internal":"Unknown diesel error creating SwitchPortSettings called \"example\": Error deserializing field 'local_pref': Received more than 2 bytes while decoding an i16. Was an Integer expression accidentally marked as SmallInt?","latency_us":133766,"response_code":500}
```

The error occurred because the Rust type was `SqlU8` and the database
type was `INT8`. Reads would fail because `INT8` columns could not be
read into `SqlU8` types without loss of precision.

This was caught in
oxidecomputer/terraform-provider-oxide#426 when
implementing a Terraform provider resource for switch port settings.
It's likely that this has been broken since
#6693 and any user that
attempted to set `rib_priority` in their Rack Setup Service (RSS) would
have encountered this error. However, `rib_priority` is an uncommon
configuration option and none of our customer's RSS configurations show
this as being set.

Given this information, it seems safe to assume that no customer has the
`rib_priority` column set so the clamping logic implemented here should
work well for customer installations.
Copy link
Collaborator

@karencfv karencfv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing my comments so quickly! Just a few bits to go, and should be ready soon

Comment on lines +786 to +793
// 2. This function assumes null values by default to prevent Terraform from
// having a non-empty refresh plan right after a successful apply. For example,
// assume the Terraform configuration omits the `bgp_peers` attribute. After an
// apply, the Oxide `switch_port_settings_create` API returns `"bgp_peers": []`
// and Terraform plans must see this as a null value to match the configuration.
// Otherwise, the Terraform refresh plan will read the value `[]` and attempt to
// set it back to null since the configuration had `bgp_peers` omitted. This is
// why you'll see `if len(settings.Addresses) > 0` conditions.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. bgp_peers is optional, so I can see why we'd ignore an empty field in the response. What about the required ones (Addresses and Links)? We could safely assume that they would never be empty in the response no?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bgp_peers is optional in the request but required in the response. For asymmetric cases like this we want to treat [] as null in Terraform to avoid the non-empty refresh plan. Since the nested structure within bgp_peers is required a user cannot set bgp_peers = [] in their Terraform configuration so to remove BGP peers they would just omit it from the configuration (i.e., null). That's what the if len() > 0 condition covers.

For the attributes that are symmetrical (i.e., addresses) where they are required in the request and required in the response we could remove the if len() > 0 check` if we desired and the behavior would remain the same. I don't have a preference here since the positive case of length being non-zero is what will be executed all the time.

Comment on lines 50 to 141
initialConfigTmpl := `
resource "oxide_switch_port_settings" "{{.BlockName}}" {
name = "{{.SwitchPortSettingsName}}"
description = "Terraform acceptance testing."

port_config = {
geometry = "qsfp28x1"
}

addresses = [
{
link_name = "phy0"
addresses = [
{
address = "0.0.0.0/0"
address_lot_id = "{{.AddressLotID}}"
},
]
},
]

links = [
{
link_name = "phy0"
autoneg = false
mtu = 1500
speed = "speed1_g"
lldp = {
enabled = false
}
},
]

routes = [
{
link_name = "phy0"
routes = [
{
dst = "0.0.0.0/0"
gw = "0.0.0.0"
},
]
},
]
}
`
updateConfigTmpl := `
resource "oxide_switch_port_settings" "{{.BlockName}}" {
name = "{{.SwitchPortSettingsName}}"
description = "Terraform acceptance testing (updated)."

port_config = {
geometry = "qsfp28x1"
}

addresses = [
{
link_name = "phy0"
addresses = [
{
address = "0.0.0.0/0"
address_lot_id = "{{.AddressLotID}}"
},
]
},
]

links = [
{
link_name = "phy0"
autoneg = false
mtu = 1500
speed = "speed1_g"
lldp = {
enabled = true
}
},
]

routes = [
{
link_name = "phy0"
routes = [
{
dst = "0.0.0.0/0"
gw = "0.0.0.0"
},
]
},
]
}
`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry! Just noticed this test doesn't include any of the optional attributes. Let's add in one of the two configurations some values for bgp_peers, groups, interfaces, and all of the optional fields.

Copy link
Collaborator

@sudomateo sudomateo Jul 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added interfaces for now in ccb557d.

Attempted to add bgp_peers but that required manually creating a BGP announce set and BGP configuration. Also, when I went to apply this I ran into an API type error that I noticed the other day but didn't dig into. I created oxidecomputer/oxide.go#303 to track this.

I'm not sure what groups is meant to represent here. @internet-diglett any context you can provide here?

Comment on lines +1157 to +1160
// toNetworkingSwitchPortSettingsCreateParams converts [switchPortSettingsModel`
// to [oxide.NetworkingSwitchPortSettingsCreateParams]. This is far simpler than
// [toSwitchPortSettingsModel] since the Oxide `switch_port_settings_create` API
// request body matches the Terraform schema.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// toNetworkingSwitchPortSettingsCreateParams converts [switchPortSettingsModel`
// to [oxide.NetworkingSwitchPortSettingsCreateParams]. This is far simpler than
// [toSwitchPortSettingsModel] since the Oxide `switch_port_settings_create` API
// request body matches the Terraform schema.
// toNetworkingSwitchPortSettingsCreateParams converts `switchPortSettingsModel`
// to `oxide.NetworkingSwitchPortSettingsCreateParams`. This is far simpler than
// `toSwitchPortSettingsModel` since the Oxide `switch_port_settings_create` API
// request body matches the Terraform schema.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used [ and ] since that's Go doc link syntax and it supports go to definition in LSPs. Happy to change it. Just giving context.

Comment on lines +779 to +784
// 1. This function handles the asymmetrical nature of the
// `switch_port_settings_create` API. For example, the request body
// accepts attributes such as `addresses[].addresses[].address` but returns
// `addresses[].address`. This function handles that conversion by creating a
// map of all the respective configurations for a given link name and using it
// to populate the nested model.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious about this one. I'm sure you both talked about this already. But just want to know, @internet-diglett, why can't the API return the same addresses data structure that the switch port settings were created with?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We didn't talk about the asymmetry of the API in much detail. As I understand it this Terraform resource is urgently needed to quickly test switch port settings for ongoing networking work. Some of that work is for validating different Oxide deployments and some of that work is going to be used to update the API design for these networking APIs.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's probably no reason it can't, just that we haven't shaped it that way today. The primary focus with this segment of work was to get Terraform configuration working and fixing anything in the API that fundamentally prevented it from working. This implementation will not be the final form. Going forward we will be extracting certain parts of the config into separate resources because we have determined they do not need to be tied to a switch port (like routes), adding things like etags, ipv6 support, etc. so we kinda had to draw a line somewhere for the scope of work we wanted to focus on for the time being.

@sudomateo
Copy link
Collaborator

sudomateo commented Jul 18, 2025

@internet-diglett when I create multiple links with lldp_link_config enabled the API response does not show them both as enabled. Notice how the first element has a null lldp_link_config but the second element doesn't. This is the output retrieved directly from the /v1/system/networking/switch-port-settings/example API.

  "links": [
    {
      "port_settings_id": "698f4600-0f9d-48d0-ad8b-caf00e7b0e92",
      "link_name": "phy0",
      "mtu": 1500,
      "fec": null,
      "speed": "speed1_g",
      "autoneg": false,
      "lldp_link_config": null,
      "tx_eq_config": null
    },
    {
      "port_settings_id": "698f4600-0f9d-48d0-ad8b-caf00e7b0e92",
      "link_name": "phy1",
      "mtu": 1500,
      "fec": null,
      "speed": "speed10_g",
      "autoneg": false,
      "lldp_link_config": {
        "id": "42f34fbd-6556-4dff-ab4b-e387ed7a761a",
        "enabled": true,
        "link_name": "phy0",
        "link_description": null,
        "chassis_id": null,
        "system_name": null,
        "system_description": null,
        "management_ip": null
      },
      "tx_eq_config": null
    }
  ],

This results in a non-empty refresh plan.

> terraform apply
oxide_switch_port_settings.example: Refreshing state... [id=698f4600-0f9d-48d0-ad8b-caf00e7b0e92]

Terraform will perform the following actions:

  # oxide_switch_port_settings.example will be updated in-place
  ~ resource "oxide_switch_port_settings" "example" {
      ~ id            = "698f4600-0f9d-48d0-ad8b-caf00e7b0e92" -> (known after apply)
      ~ links         = [
          - {
              - autoneg   = false -> null
              - link_name = "phy1" -> null
              - mtu       = 1500 -> null
              - speed     = "speed10_g" -> null
            },
          + {
              + autoneg   = false
              + link_name = "phy1"
              + lldp      = {
                  + enabled   = true
                  + link_name = "phy0"
                }
              + mtu       = 1500
              + speed     = "speed10_g"
            },
            # (1 unchanged element hidden)
        ]
        name          = "example"
      ~ time_created  = "2025-07-18 16:37:02.914391 +0000 UTC" -> (known after apply)
      ~ time_modified = "2025-07-18 16:37:02.914391 +0000 UTC" -> (known after apply)
        # (5 unchanged attributes hidden)
    }

Wondering if this is a bug in the Omicron API.

Given the goals of this resource discussed in #426 (comment) this isn't a blocker. I just wanted to note it.

@internet-diglett
Copy link
Author

@sudomateo

Wondering if this is a bug in the Omicron API.

I've created an issue here:
oxidecomputer/omicron#8640

Can you attach your terraform config to that for reproduction and testing?

@sudomateo
Copy link
Collaborator

I've created an issue here: oxidecomputer/omicron#8640

Can you attach your terraform config to that for reproduction and testing?

Done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants