Skip to content

Commit c104a88

Browse files
authored
Merge pull request #1315 from rackerlabs/hardware-categorization
feat: Add hardware categorization design
2 parents 50893a6 + de2ad0e commit c104a88

File tree

22 files changed

+3333
-71
lines changed

22 files changed

+3333
-71
lines changed

docs/design-guide/device-types.md

Lines changed: 280 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,280 @@
1+
# Device Types
2+
3+
Device types provide a structured way to define and categorize hardware models
4+
supported by UnderStack. They serve as a declarative specification of hardware
5+
characteristics, enabling consistent hardware identification, resource class
6+
mapping, and infrastructure automation across the platform.
7+
8+
## Purpose and Architecture
9+
10+
Device type definitions solve several critical challenges in bare metal
11+
infrastructure management:
12+
13+
* **Hardware Identification**: Precise specification of manufacturer, model,
14+
and physical attributes enables automated detection and categorization
15+
* **Resource Classification**: Multiple resource class configurations per
16+
device type allow flexible mapping of the same hardware model to different
17+
Nova flavors and workload profiles
18+
* **Infrastructure as Code**: Hardware specifications live in Git alongside
19+
deployment configurations, providing versioning, review, and audit capabilities
20+
* **Cross-Platform Integration**: Device types integrate with Nautobot,
21+
Ironic, and Nova to provide consistent hardware metadata throughout the stack
22+
23+
## Schema Structure
24+
25+
Device type definitions follow the [device-type.schema.json](https://github.com/rackerlabs/understack/blob/main/schema/device-type.schema.json)
26+
JSON Schema, which enforces validation and consistency across all definitions.
27+
28+
### Common Properties
29+
30+
All device types must specify:
31+
32+
* **class**: Device category - `server`, `switch`, or `firewall`
33+
* **manufacturer**: Hardware vendor (e.g., "Dell", "HPE")
34+
* **model**: Specific model identifier (e.g., "PowerEdge R7615")
35+
* **u_height**: Rack unit height (must be greater than 0)
36+
* **is_full_depth**: Boolean indicating full-depth rack mounting
37+
38+
### Optional Properties
39+
40+
Device types may include:
41+
42+
* **interfaces**: Named physical network interfaces on the hardware. Used to
43+
define specific ports such as management interfaces (BMC/iDRAC/iLO) or
44+
named switch ports. Each interface has:
45+
* `name`: Interface identifier (e.g., "iDRAC", "eth0", "mgmt")
46+
* `type`: Physical interface type (e.g., "1000base-t", "10gbase-x-sfp+")
47+
* `mgmt_only`: Boolean flag indicating management-only interfaces
48+
* **power-ports**: Power inlet specifications for the device. Each power port has:
49+
* `name`: Power port identifier (e.g., "psu1", "psu2")
50+
* `type`: Power port connector type (e.g., "iec-60320-c14", "iec-60320-c20") - see [Nautobot PowerPortTypeChoices](https://github.com/nautobot/nautobot/blob/develop/nautobot/dcim/choices.py#L507) for valid values
51+
* `maximum_draw`: Maximum power draw in watts (optional)
52+
* **resource_class**: Array of resource class configurations (required for
53+
`class: server`)
54+
55+
### Resource Classes
56+
57+
For server-class devices, resource classes define the specific hardware
58+
configurations that map to OpenStack Nova flavors. Multiple resource classes
59+
can be defined for the same hardware model to represent common build
60+
configurations in the data center (e.g., different CPU, RAM, or drive
61+
populations of the same chassis).
62+
63+
During server enrollment, the hardware inspection data is matched against
64+
these resource class definitions. The matching resource class name is set on
65+
the Ironic node's `resource_class` property, which is then used to create
66+
corresponding Nova flavors for workload scheduling.
67+
68+
Each resource class requires:
69+
70+
* **name**: Resource class identifier (e.g., "m1.small", "compute-optimized").
71+
This value will be set on the Ironic node and used for Nova flavor creation.
72+
* **cpu**: Object with `cores` (number) and `model` (string)
73+
* **memory**: Object with `size` in GB
74+
* **drives**: Array of drive objects, each with `size` in GB
75+
* **nic_count**: Minimum number of user-usable network interfaces (integer).
76+
This represents general-purpose network ports available for workload traffic,
77+
not tied to specific named interfaces. Used to verify the server has
78+
sufficient network connectivity for the workload profile.
79+
80+
## Example Definition
81+
82+
```yaml
83+
# yaml-language-server: $schema=https://rackerlabs.github.io/understack/device-type.schema.json
84+
class: server
85+
manufacturer: Dell
86+
model: PowerEdge R7615
87+
u_height: 2
88+
is_full_depth: true
89+
90+
# Named physical interfaces (management, specific ports)
91+
interfaces:
92+
- name: iDRAC
93+
type: 1000base-t
94+
mgmt_only: true
95+
96+
# Power inlet specifications
97+
power-ports:
98+
- name: psu1
99+
type: iec-60320-c14
100+
maximum_draw: 750
101+
- name: psu2
102+
type: iec-60320-c14
103+
maximum_draw: 750
104+
105+
resource_class:
106+
- name: m1.small
107+
cpu:
108+
cores: 16
109+
model: AMD EPYC 9124
110+
memory:
111+
size: 128
112+
drives:
113+
- size: 480
114+
- size: 480
115+
# User-usable network interfaces (not tied to specific named ports)
116+
nic_count: 2
117+
```
118+
119+
## Integration Points
120+
121+
### GitOps Deployment
122+
123+
Device type definitions live in the deployment repository under
124+
`hardware/device-types/`. They are packaged as Kubernetes ConfigMaps via
125+
Kustomize, making them available to platform components.
126+
127+
### Resource Class Matching and Nova Flavors
128+
129+
During bare metal enrollment:
130+
131+
1. Hardware is inspected via Ironic to collect CPU, memory, drive, and network
132+
interface data
133+
2. The `understack-flavor-matcher` service compares inspection data against
134+
device type resource class definitions
135+
3. When a match is found, the resource class name is set on the Ironic node's
136+
`resource_class` property
137+
4. Nova flavors are created or updated based on the resource class, making the
138+
hardware available for workload scheduling
139+
140+
**Multiple Resource Classes**: Define multiple resource classes for the same
141+
device type when you have common build variations of the same chassis. For
142+
example, a Dell PowerEdge R7615 might be populated with different CPU models,
143+
RAM capacities, or drive configurations depending on the intended workload
144+
(compute, storage, memory-intensive, etc.).
145+
146+
### Nautobot Synchronization
147+
148+
Device types provide the source of truth for hardware specifications that are
149+
synchronized to Nautobot's device type models, ensuring consistency between
150+
the deployment repository and the infrastructure CMDB.
151+
152+
### Ironic Integration
153+
154+
During bare metal enrollment and inspection, Ironic driver metadata is
155+
validated against device type definitions to confirm hardware matches
156+
expected specifications.
157+
158+
## File Organization
159+
160+
Device type definitions are organized in the deployment repository:
161+
162+
```text
163+
hardware/
164+
├── base/
165+
│ └── kustomization.yaml # ConfigMap generation
166+
└── device-types/
167+
├── dell-poweredge-r7615.yaml
168+
├── hpe-proliant-dl360.yaml
169+
└── ...
170+
```
171+
172+
The `base/kustomization.yaml` generates a ConfigMap containing all device
173+
type definitions:
174+
175+
```yaml
176+
configMapGenerator:
177+
- name: device-types
178+
options:
179+
disableNameSuffixHash: true
180+
files:
181+
- dell-poweredge-r7615.yaml=../device-types/dell-poweredge-r7615.yaml
182+
```
183+
184+
## Schema Validation
185+
186+
Device type files include a YAML language server directive for editor-based
187+
validation:
188+
189+
```yaml
190+
# yaml-language-server: $schema=https://rackerlabs.github.io/understack/device-type.schema.json
191+
```
192+
193+
The schema enforces:
194+
195+
* Required field presence
196+
* Type correctness (strings, numbers, booleans, arrays, objects)
197+
* Enum constraints (e.g., `class` must be server/switch/firewall)
198+
* Conditional requirements (servers must have resource classes)
199+
* Numeric constraints (e.g., `u_height > 0`)
200+
201+
## Resource Class Assignment
202+
203+
When a device-type defines multiple resource classes for the same hardware model, the Ironic inspection process determines which resource class to assign to each discovered node through exact hardware matching.
204+
205+
### Inspection Hook Matching Logic
206+
207+
The `resource-class` inspection hook in `python/ironic-understack` performs the following steps:
208+
209+
1. **Hardware Discovery**: Ironic inspection discovers hardware specifications:
210+
* CPU cores and model
211+
* Memory size (in MB)
212+
* Drive sizes and count
213+
* System manufacturer and model
214+
215+
2. **Device-Type Matching**: Hook reads device-types ConfigMap and matches:
216+
* Manufacturer name (e.g., "Dell", "HPE")
217+
* Model name (e.g., "PowerEdge R7615")
218+
219+
3. **Resource Class Matching**: Within matched device-type, hook compares discovered specs against each resource class:
220+
* CPU details must match `cpu.cores` and `cpu.model` exactly
221+
* Memory size must match `memory.size` (converted to MB)
222+
* Drive count and sizes must match `drives` array
223+
* Network interface count must match `nic_count`
224+
225+
4. **Assignment**: Hook sets `resource_class` property on Ironic node to the matching resource class name
226+
227+
### Example Matching Scenario
228+
229+
Device-type definition with multiple resource classes:
230+
231+
```yaml
232+
manufacturer: Dell
233+
model: PowerEdge R7615
234+
resource_class:
235+
- name: m1.small
236+
cpu: {cores: 16, model: AMD EPYC 9124}
237+
memory: {size: 128}
238+
drives: [{size: 480}, {size: 480}]
239+
- name: m1.medium
240+
cpu: {cores: 32, model: AMD EPYC 9334}
241+
memory: {size: 256}
242+
drives: [{size: 960}, {size: 960}]
243+
```
244+
245+
Inspection discovers Dell PowerEdge R7615 with 32 cores, 256 GB RAM, two 960 GB drives:
246+
247+
* Matches device-type: Dell PowerEdge R7615
248+
* Matches resource class: m1.medium (exact CPU/memory/drives match)
249+
* Sets `node.resource_class = "m1.medium"`
250+
251+
### Matching Requirements
252+
253+
* **Exact matching**: All specs (CPU cores, memory size, drive sizes) must match exactly
254+
* **No partial matches**: If any spec differs, resource class is not matched
255+
* **No match fallback**: If no resource class matches discovered specs, inspection fails with error
256+
* **Drive order matters**: Drive sizes are matched in array order
257+
258+
This ensures predictable resource class assignment and prevents misconfiguration.
259+
260+
## Management Workflow
261+
262+
Device types are managed through the `understackctl` CLI tool:
263+
264+
**Adding new device types:**
265+
266+
1. Create new device type definitions as YAML files
267+
2. Validate and add with `understackctl device-type add <file>` (automatically updates Kustomization)
268+
3. Commit to Git and submit pull request
269+
4. ArgoCD detects changes and updates ConfigMap
270+
5. Ironic inspection hook reads updated ConfigMap and uses new definitions for matching
271+
272+
**Updating existing device types:**
273+
274+
1. Edit the device type file in `$UC_DEPLOY/hardware/device-types/`
275+
2. Validate with `understackctl device-type validate <file>`
276+
3. Commit to Git and submit pull request
277+
4. ArgoCD detects changes and updates ConfigMap
278+
279+
See the [operator guide](../operator-guide/device-types.md) for detailed
280+
command usage and examples.

0 commit comments

Comments
 (0)