From bb40d6f101aa1956c3073440501e6670a04ca049 Mon Sep 17 00:00:00 2001 From: Logan Blyth Date: Mon, 24 Mar 2025 17:06:17 -0400 Subject: [PATCH] docs: complete fab.yaml file Also remove mentions of vlab outside of vlab section. Take Pau's suggestion to document password hash generation. Add links to external telemetry. Signed-off-by: Logan Blyth Apply suggestions from Quentin Co-authored-by: Quentin Monnet --- docs/concepts/overview.md | 15 +- docs/install-upgrade/build-wiring.md | 61 ++----- docs/install-upgrade/config.md | 241 ++++++++++++--------------- docs/install-upgrade/install.md | 2 +- docs/user-guide/grafana.md | 2 +- 5 files changed, 124 insertions(+), 197 deletions(-) diff --git a/docs/concepts/overview.md b/docs/concepts/overview.md index 5f4fb40b..04aba5b3 100644 --- a/docs/concepts/overview.md +++ b/docs/concepts/overview.md @@ -51,16 +51,17 @@ Wiring Diagram consists of the following resources: ## Fabricator -Installer builder and VLAB. - -* Installer builder based on a preset (currently: `vlab` for virtual and `lab` for physical) - * Main input: Wiring Diagram - * All input artifacts coming from OCI registry - * Always full airgap (everything running from private registry) +Creates installation media. + +* Features of fabricator: + * Inputs: [Wiring Diagram](../install-upgrade/build-wiring.md) and + [Config](../install-upgrade/config.md) + * All input artifacts delivered via OCI registry + * Capable of full airgap (everything running from private registry) + installation * Flatcar Linux for Control Node, generated `ignition.json` * Automatic K3s installation and private registry setup * All components and their dependencies running in Kubernetes -* Integrated Virtual Lab (VLAB) management * Future: * In-cluster (control) Operator to manage all components * Upgrades handling for everything starting Control Node OS diff --git a/docs/install-upgrade/build-wiring.md b/docs/install-upgrade/build-wiring.md index 9b7f669f..0f328335 100644 --- a/docs/install-upgrade/build-wiring.md +++ b/docs/install-upgrade/build-wiring.md @@ -3,55 +3,18 @@ ## Overview -A wiring diagram is a YAML file that is a digital representation of your network. You can find more YAML level details in the User Guide section [switch features and port naming](../user-guide/profiles.md) and the [api](../reference/api.md). It's mandatory for all switches to reference a `SwitchProfile` in the `spec.profile` of the `Switch` object. Only port naming defined by switch profiles could be used in the wiring diagram, NOS (or any other) port names aren't supported. - -In the meantime, to have a look at working wiring diagram for Hedgehog Fabric, run the sample generator that produces -working wiring diagrams: - -```console -ubuntu@sl-dev:~$ hhfab sample -h - -NAME: - hhfab sample - generate sample wiring diagram - -USAGE: - hhfab sample command [command options] - -COMMANDS: - spine-leaf, sl generate sample spine-leaf wiring diagram - collapsed-core, cc generate sample collapsed-core wiring diagram - help, h Shows a list of commands or help for one command - -OPTIONS: - --help, -h show help -``` - -Or you can generate a wiring diagram for a VLAB environment with flags to customize number of switches, links, servers, etc.: - -```console -ubuntu@sl-dev:~$ hhfab vlab gen --help -NAME: - hhfab vlab generate - generate VLAB wiring diagram - -USAGE: - hhfab vlab generate [command options] - -OPTIONS: - --bundled-servers value number of bundled servers to generate for switches (only for one of the second switch in the redundancy group or orphan switch) (default: 1) - --eslag-leaf-groups value eslag leaf groups (comma separated list of number of ESLAG switches in each group, should be 2-4 per group, e.g. 2,4,2 for 3 groups with 2, 4 and 2 switches) - --eslag-servers value number of ESLAG servers to generate for ESLAG switches (default: 2) - --fabric-links-count value number of fabric links if fabric mode is spine-leaf (default: 0) - --help, -h show help - --mclag-leafs-count value number of mclag leafs (should be even) (default: 0) - --mclag-peer-links value number of mclag peer links for each mclag leaf (default: 0) - --mclag-servers value number of MCLAG servers to generate for MCLAG switches (default: 2) - --mclag-session-links value number of mclag session links for each mclag leaf (default: 0) - --no-switches do not generate any switches (default: false) - --orphan-leafs-count value number of orphan leafs (default: 0) - --spines-count value number of spines if fabric mode is spine-leaf (default: 0) - --unbundled-servers value number of unbundled servers to generate for switches (only for one of the first switch in the redundancy group or orphan switch) (default: 1) - --vpc-loopbacks value number of vpc loopbacks for each switch (default: 0) -``` +A wiring diagram is a YAML file that is a digital representation of your +network. You can find more YAML level details in the User Guide section [switch +features and port naming](../user-guide/profiles.md) and the +[api](../reference/api.md). It's mandatory for all switches to reference a +`SwitchProfile` in the `spec.profile` of the `Switch` object. Only port naming +defined by switch profiles could be used in the wiring diagram, NOS (or any +other) port names aren't supported. An complete example wiring diagram is +[below](build-wiring.md#sample-wiring-diagram). + +A good place to start building a wiring diagram is with the switch profiles. +Start with the switches, then move onto the fabric links, and finally the +server connections. ### Sample Switch Configuration ``` { .yaml .annotate linenums="1" } diff --git a/docs/install-upgrade/config.md b/docs/install-upgrade/config.md index b2983e0d..bb93ef24 100644 --- a/docs/install-upgrade/config.md +++ b/docs/install-upgrade/config.md @@ -1,30 +1,23 @@ # Fabric Configuration ## Overview -The `fab.yaml` file is the configuration file for the fabric. It supplies the configuration of the users, their credentials, logging, telemetry, and other non wiring related settings. The `fab.yaml` file is composed of multiple YAML documents inside of a single file. Per the YAML spec 3 hyphens (`---`) on a single line separate the end of one document from the beginning of the next. There are two YAML documents in the `fab.yaml` file. For more information about how to use `hhfab init`, run `hhfab init --help`. +The `fab.yaml` file is the configuration file for the fabric. It supplies +the configuration of the users, their credentials, logging, telemetry, and +other non wiring related settings. The `fab.yaml` file is composed of multiple +YAML documents inside of a single file. Per the YAML spec 3 hyphens (`---`) on +a single line separate the end of one object from the beginning of the next. +There are two YAML objects in the `fab.yaml` file. For more information about +how to use `hhfab init`, run `hhfab init --help`. +## HHFAB workflow -## Typical HHFAB workflows +After `hhfab` has been [downloaded](../getting-started/download.md): -### HHFAB for VLAB - -For a VLAB user, the typical workflow with hhfab is: - -1. `hhfab init --dev` -1. `hhfab vlab gen` -1. `hhfab vlab up` - -The above workflow will get a user up and running with a spine-leaf VLAB. - -### HHFAB for Physical Machines - -It's possible to start from scratch: - -1. `hhfab init` (see different flags to customize initial configuration) +1. `hhfab init`(see different flags to customize initial configuration) 1. Adjust the `fab.yaml` file to your needs 1. `hhfab validate` 1. `hhfab build` -Or import existing config and wiring files: +Or import existing `fab.yaml` and wiring files: 1. `hhfab init -c fab.yaml -w wiring-file.yaml -w extra-wiring-file.yaml` 1. `hhfab validate` @@ -32,118 +25,9 @@ Or import existing config and wiring files: After the above workflow a user will have a .img file suitable for installing the control node, then bringing up the switches which comprise the fabric. -## Fab.yaml - -### Configure control node and switch users - -Configuring control node and switch users is done either passing `--default-password-hash` to `hhfab init` or editing the resulting `fab.yaml` file emitted by `hhfab init`. You can specify users to be configured on the control node(s) and switches in the following format: - -``` {.yaml .annotation linenums="1"} -spec: - config: - control: - defaultUser: # user 'core' on all control nodes - password: "hashhashhashhashhash" # password hash - authorizedKeys: - - "ssh-ed25519 SecREKeyJumblE" - - fabric: - mode: spine-leaf # "spine-leaf" or "collapsed-core" - - defaultSwitchUsers: - admin: # at least one user with name 'admin' and role 'admin' - role: admin - #password: "$5$8nAYPGcl4..." # password hash - #authorizedKeys: # optional SSH authorized keys - # - "ssh-ed25519 AAAAC3Nza..." - op: # optional read-only user - role: operator - #password: "$5$8nAYPGcl4..." # password hash - #authorizedKeys: # optional SSH authorized keys - # - "ssh-ed25519 AAAAC3Nza..." - -``` - -Control node(s) user is always named `core`. - -The role of the user,`operator` is read-only access to `sonic-cli` command on the switches. In order to avoid conflicts, do not use the following usernames: `operator`,`hhagent`,`netops`. - -### NTP and DHCP -The control node uses public ntp servers from cloudflare and google by default. The control node runs a dhcp server on the management network. See the [example file](#complete-example-file). - -## Control Node -The control node is the host that manages all the switches, runs k3s, and serves images. This is the YAML document configure the control node: -``` {.yaml .annotation linenums="1"} -apiVersion: fabricator.githedgehog.com/v1beta1 -kind: ControlNode -metadata: - name: control-1 - namespace: fab -spec: - bootstrap: - disk: "/dev/sda" # disk to install OS on, e.g. "sda" or "nvme0n1" - external: - interface: enp2s0 # interface for external - ip: dhcp # IP address for external interface - management: - interface: enp2s1 # interface for management - -# Currently only one ControlNode is supported -``` -The **management** interface is for the control node to manage the fabric switches, *not* end-user management of the control node. For end-user management of the control node specify the **external** interface name. - -### Forward switch metrics and logs - -There is an option to enable Grafana Alloy on all switches to forward metrics and logs to the configured targets using -Prometheus Remote-Write API and Loki API. If those APIs are available from Control Node(s), but not from the switches, -it's possible to enable HTTP Proxy on Control Node(s) that will be used by Grafana Alloy running on the switches to -access the configured targets. It could be done by passing `--control-proxy=true` to `hhfab init`. - -Metrics includes port speeds, counters, errors, operational status, transceivers, fans, power supplies, temperature -sensors, BGP neighbors, LLDP neighbors, and more. Logs include agent logs. - -Configuring the exporters and targets is currently only possible by editing the `fab.yaml` configuration file. An example configuration is provided below: - -``` {.yaml .annotation linenums="1"} -spec: - config: - ... - defaultAlloyConfig: - agentScrapeIntervalSeconds: 120 - unixScrapeIntervalSeconds: 120 - unixExporterEnabled: true - lokiTargets: - grafana_cloud: # target name, multiple targets can be configured - basicAuth: # optional - password: "" - username: "" - labels: # labels to be added to all logs - env: env-1 - url: https://logs-prod-021.grafana.net/loki/api/v1/push - useControlProxy: true # if the Loki API is not available from the switches directly, use the Control Node as a proxy - prometheusTargets: - grafana_cloud: # target name, multiple targets can be configured - basicAuth: # optional - password: "" - username: "" - labels: # labels to be added to all metrics - env: env-1 - sendIntervalSeconds: 120 - url: https://prometheus-prod-36-prod-us-west-0.grafana.net/api/prom/push - useControlProxy: true # if the Loki API is not available from the switches directly, use the Control Node as a proxy - unixExporterCollectors: # list of node-exporter collectors to enable, https://grafana.com/docs/alloy/latest/reference/components/prometheus.exporter.unix/#collectors-list - - cpu - - filesystem - - loadavg - - meminfo - collectSyslogEnabled: true # collect /var/log/syslog on switches and forward to the lokiTargets -``` - -For additional options, see the `AlloyConfig` [struct in Fabric repo](https://github.com/githedgehog/fabric/blob/master/api/meta/alloy.go). - ## Complete Example File -``` {.yaml .annotation linenums="1" title="fab.yaml"} +``` { .yaml .annotate title="fab.yaml" linenums="1"} apiVersion: fabricator.githedgehog.com/v1beta1 kind: Fabricator metadata: @@ -159,10 +43,10 @@ spec: - time.cloudflare.com - time1.google.com - defaultUser: # user 'core' on all control nodes - password: "hash..." # password hash + defaultUser: # username 'core' on all control nodes + password: "hash..." # generate hash with openssl passwd -5 authorizedKeys: - - "ssh-ed25519 hash..." + - "ssh-ed25519 key..." # generate ssh key with ssh-keygen fabric: mode: spine-leaf # "spine-leaf" or "collapsed-core" @@ -170,14 +54,14 @@ spec: defaultSwitchUsers: admin: # at least one user with name 'admin' and role 'admin' role: admin - password: "hash..." # password hash + password: "hash..." # generate hash with openssl passwd -5 authorizedKeys: - - "ssh-ed25519 hash..." + - "ssh-ed25519 key..." op: # optional read-only user role: operator - password: "hash..." # password hash + password: "hash..." # generate hash with openssl passwd -5 authorizedKeys: - - "ssh-ed25519 hash..." + - "ssh-ed25519 key..." # generate ssh key with ssh-keygen defaultAlloyConfig: agentScrapeIntervalSeconds: 120 @@ -187,13 +71,11 @@ spec: lokiTargets: lab: url: http://url.io:3100/loki/api/v1/push - useControlProxy: true labels: descriptive: name prometheusTargets: lab: url: http://url.io:9100/api/v1/push - useControlProxy: true labels: descriptive: name sendIntervalSeconds: 120 @@ -208,10 +90,91 @@ spec: bootstrap: disk: "/dev/sda" # disk to install OS on, e.g. "sda" or "nvme0n1" external: - interface: eno2 # interface for external + interface: eno2 # customer interface to manage control node ip: dhcp # IP address for external interface - management: + management: # interface that manages switches in private management network interface: eno1 # Currently only one ControlNode is supported ``` + +### Configure Control Node and Switch Users + +#### Control Node Users +Configuring control node and switch users is done either passing +`--default-password-hash` to `hhfab init` or editing the resulting `fab.yaml` +file emitted by `hhfab init`. The default username on the control node is +`core`. + +#### Switch Users +There are two users on the switches, `admin` and `operator`. The `operator` user has +read-only access to `sonic-cli` command on the switches. The `admin` user has +broad administrative power on the switch. +In order to avoid conflicts, do not use the following usernames: `operator`,`hhagent`,`netops`. + +### NTP and DHCP +The control node uses public NTP servers from Cloudflare and Google by default. +The control node runs a DHCP server on the management network. See the [example +file](#complete-example-file). + +### Control Node +The control node is the host that manages all the switches, runs k3s, and serves images. +The **management** interface is for the control node to manage the fabric +switches, *not* end-user management of the control node. For end-user +management of the control node specify the **external** interface name. + +### Telemetry + +There is an option to enable [Grafana +Alloy](https://grafana.com/docs/alloy/latest/) on all switches to forward metrics and logs to the configured targets using +[Prometheus Remote-Write +API](https://prometheus.io/docs/specs/prw/remote_write_spec/) and Loki API. Metrics includes port speeds, counters, +errors, operational status, transceivers, fans, power supplies, temperature +sensors, BGP neighbors, LLDP neighbors, and more. Logs include Hedgehog agent logs. + +Telemetry can be enabled after installation of the fabric. Open the following +YAML file in an editor on the control node. Modify the fields as needed. Logs +can be pushed to a Grafana instance at the customer environment, or to Grafana +cloud. + +```{ .yaml title="telemetry.yaml" linenums="1" } +spec: + config: + fabric: + defaultAlloyConfig: + agentScrapeIntervalSeconds: 120 + unixScrapeIntervalSeconds: 120 + unixExporterEnabled: true + lokiTargets: + grafana_cloud: # target name, multiple targets can be configured + basicAuth: # optional + password: "" + username: "" + labels: # labels to be added to all logs + env: env-1 + url: https://logs-prod-021.grafana.net/loki/api/v1/push + prometheusTargets: + grafana_cloud: # target name, multiple targets can be configured + basicAuth: # optional + password: "" + username: "" + labels: # labels to be added to all metrics + env: env-1 + sendIntervalSeconds: 120 + url: https://prometheus-prod-36-prod-us-west-0.grafana.net/api/prom/push + unixExporterCollectors: # list of node-exporter collectors to enable, https://grafana.com/docs/alloy/latest/reference/components/prometheus.exporter.unix/#collectors-list + - cpu + - filesystem + - loadavg + - meminfo + collectSyslogEnabled: true # collect /var/log/syslog on switches and forward to the lokiTargets +``` + +To enable the telemetry after install use: + +``` shell +kubectl patch -n fab --type merge fabricator/default --patch-file telemetry.yaml +``` + +For additional options, see the `AlloyConfig` [struct in Fabric repo](https://github.com/githedgehog/fabric/blob/master/api/meta/alloy.go). + diff --git a/docs/install-upgrade/install.md b/docs/install-upgrade/install.md index 582e3d60..6d3fab95 100644 --- a/docs/install-upgrade/install.md +++ b/docs/install-upgrade/install.md @@ -43,7 +43,7 @@ for writing to a USB flash drive or mounting via IPMI virtual media. The first ` run is `hhfab init`. This will generate the main configuration file, `fab.yaml`. `fab.yaml` is responsible for almost every configuration of the fabric with the exception of the wiring. Each command and subcommand have usage messages, simply supply the `-h` flag to your command or sub -command to see the available options. For example `hhfab vlab -h` and `hhfab vlab gen -h`. +command to see the available options. For example `hhfab init -h`. ### HHFAB commands to make a bootable image diff --git a/docs/user-guide/grafana.md b/docs/user-guide/grafana.md index b08507a9..8ffef31c 100644 --- a/docs/user-guide/grafana.md +++ b/docs/user-guide/grafana.md @@ -1,7 +1,7 @@ # Grafana Dashboards To provide monitoring for most critical metrics from the switches managed by Hedgehog Fabric there are several Dashboards that may be used in Grafana deployments. Make sure that you've enabled metrics and logs collection for the switches in the Fabric that is -described in [Fabric Config](../install-upgrade/config.md#forward-switch-metrics-and-logs) section. +described in [Fabric Config](../install-upgrade/config.md#telemetry) section. ## Variables List of common variables used in Hedgehog Grafana dashboards