diff --git a/clusterloader2/cmd/clusterloader.go b/clusterloader2/cmd/clusterloader.go index 95d0d1da16..8d314e3afa 100644 --- a/clusterloader2/cmd/clusterloader.go +++ b/clusterloader2/cmd/clusterloader.go @@ -44,6 +44,7 @@ import ( "k8s.io/perf-tests/clusterloader2/pkg/util" _ "k8s.io/perf-tests/clusterloader2/pkg/dependency/dra" + _ "k8s.io/perf-tests/clusterloader2/pkg/dependency/kwok/dra" _ "k8s.io/perf-tests/clusterloader2/pkg/measurement/common" _ "k8s.io/perf-tests/clusterloader2/pkg/measurement/common/bundle" _ "k8s.io/perf-tests/clusterloader2/pkg/measurement/common/dns" diff --git a/clusterloader2/pkg/dependency/kwok/README.md b/clusterloader2/pkg/dependency/kwok/README.md new file mode 100644 index 0000000000..81a891a7a9 --- /dev/null +++ b/clusterloader2/pkg/dependency/kwok/README.md @@ -0,0 +1,197 @@ +# KWOK DRA Dependency + +This dependency provides fake Kubernetes nodes with Dynamic Resource Allocation (DRA) GPU resources using [KWOK (Kubernetes WithOut Kubelet)](https://kwok.sigs.k8s.io/). + +## What it does + +- Installs KWOK controller in `kwok-system` namespace +- Creates fake nodes with GPU resources exposed through DRA ResourceSlices +- Enables testing DRA workloads without real GPU hardware + +## Configuration + +Add the dependency to your ClusterLoader2 test configuration: + +```yaml +# In your CL2 test config +dependencyConfigs: +- name: kwok-dra + params: + nodes: 4 # Number of fake nodes (default: 2) + gpusPerNode: 16 # GPUs per node (default: 8) + timeout: "10m" # Setup timeout (default: 5m) +``` + +## Fake Resources Created + +### Nodes +- **Name**: `kwok-node-0`, `kwok-node-1`, etc. +- **Resources**: 32 CPU, 256Gi memory, 110 pods +- **Labels**: `type=kwok`, `kubernetes.io/hostname=kwok-node-N` +- **Taints**: `kwok.x-k8s.io/node=fake:NoSchedule` (prevents real workloads) + +### GPU Resources (DRA) +- **Driver**: `cl2-gpu.kwok.x-k8s.io` +- **API Version**: `resource.k8s.io/v1beta2` +- **ResourceSlices**: One per node with configurable GPU devices +- **Device Names**: `gpu0`, `gpu1`, `gpu2`, etc. +- **Capacity**: Each device provides `1` GPU unit (`cl2-gpu.kwok.x-k8s.io/gpu`) +- **Device Attributes**: + - `gpu-type`: "kwok-gpu" + - `memory`: "8Gi" + - `compute-capability`: "7.5" + +## Example: Simple GPU Job + +Create a job that requests fake GPUs using the provided example files: + +### 1. GPU Job Template +See [`examples/kwok-gpu-job.yaml`](examples/kwok-gpu-job.yaml) - A ClusterLoader2 job template that: +- Uses the `job-type: short-lived` labels for KWOK completion simulation +- Includes proper tolerations for KWOK fake nodes +- References the GPU ResourceClaimTemplate +- Uses templating variables (`{{.Name}}`, `{{.Replicas}}`, etc.) + +### 2. ResourceClaimTemplate +See [`examples/kwok-gpu-resource-claim-template.yaml`](examples/kwok-gpu-resource-claim-template.yaml) - Defines: +- `v1beta2` ResourceClaimTemplate for requesting GPU devices +- References the `cl2-gpu.kwok.x-k8s.io` DeviceClass +- Created first in a separate step before jobs are created + +## Running Tests with KWOK DRA + +### Using the Main E2E Script + +The `run-e2e.sh` script is the main entry point for running performance tests in the perf-tests repository. + +```bash +# Basic usage from perf-tests root directory +./run-e2e.sh [options...] + +# Run a ClusterLoader2 test with KWOK DRA dependency +./run-e2e.sh cluster-loader2 \ + --testconfig=pkg/dependency/kwok/examples/test-config.yaml \ + --provider=skeleton \ + --nodes=3 \ + --report-dir=/tmp/reports + +# Quick test with different node counts +./run-e2e.sh cluster-loader2 \ + --testconfig=pkg/dependency/kwok/examples/test-config.yaml \ + --provider=skeleton \ + --nodes=5 \ + --report-dir=/tmp/reports + +# View available test tools +./run-e2e.sh --help +``` + +### Quick Start + +1. **Prerequisites**: Ensure you have a Kubernetes cluster running +2. **Environment**: Set `KUBECONFIG` or `~/.kube/config` pointing to your cluster +3. **Run Test**: Execute the script with desired parameters +4. **Results**: Check the `--report-dir` for test results and metrics + +### Available Test Tools + +The `run-e2e.sh` script supports multiple performance testing tools: + +- **`cluster-loader2`** - Kubernetes cluster performance and scale testing +- **`network-performance`** - Network performance benchmarks +- **`kube-dns`** - DNS performance testing +- **`core-dns`** - CoreDNS performance testing +- **`node-local-dns`** - NodeLocalDNS performance testing + +### Example ClusterLoader2 Test Config + +Use the provided test configuration file: + +```bash +# Copy the example config to your test directory +cp pkg/dependency/kwok/examples/test-config.yaml your-test-config.yaml + +# Or reference it directly +./run-e2e.sh cluster-loader2 \ + --testconfig=pkg/dependency/kwok/examples/test-config.yaml \ + --provider=kind \ + --nodes=3 \ + --report-dir=/tmp/kwok-dra-test +``` + +The [`examples/test-config.yaml`](examples/test-config.yaml) includes: +- **KWOK DRA dependency** with 3 nodes and 8 GPUs per node +- **ResourceClaimTemplate creation** step (must run before jobs) +- **10 GPU jobs** requesting fake GPU resources +- **QPS throttling** for controlled job creation + +## Job Timing Configuration + +To control how long simulated jobs run before completing: + +```bash +# Set job duration to 10 seconds (10000ms) +export CL2_JOB_RUNNING_TIME_MS=10000 + +# Run your ClusterLoader2 test +./clusterloader2 --testconfig=test-config.yaml +``` + +This affects all jobs with `job-type: short-lived` labels running on KWOK nodes. + +## Important Notes + +1. **Tolerations Required**: Jobs must tolerate the `kwok.x-k8s.io/node=fake:NoSchedule` taint +2. **DeviceClass**: Uses the built-in `cl2-gpu.kwok.x-k8s.io` DeviceClass provided by the dependency +3. **Step Ordering**: ResourceClaimTemplate must be created before jobs that reference it +4. **Resource Dependencies**: Jobs depend on both DeviceClass (from dependency) and ResourceClaimTemplate (from first step) +5. **Job Labels**: Pods must have `job-type: short-lived` label for KWOK job completion simulation +6. **Job Completion**: Set `CL2_JOB_RUNNING_TIME_MS` environment variable to control simulated job duration (default: 30000ms) +7. **Device Attributes**: v1beta2 API provides rich device metadata (GPU type, memory, compute capability) +8. **Enhanced Scheduling**: ResourceSlices include proper labels for improved resource discovery +9. **Fake Resources**: GPUs are simulated - no actual GPU operations occur +10. **Cleanup**: The dependency automatically cleans up when tests complete + +## Troubleshooting + +### Common Issues + +- **Nodes not ready**: Check KWOK controller logs in `kwok-system` namespace +- **Jobs not scheduling**: Verify tolerations and DeviceClass configuration +- **Timeout errors**: Increase the `timeout` parameter in dependency config +- **ResourceClaimTemplate not found**: Ensure the ResourceClaimTemplate step runs before job creation + +### Testing the Setup + +```bash +# Test KWOK DRA dependency with example config +./run-e2e.sh cluster-loader2 \ + --testconfig=clusterloader2/pkg/dependency/kwok/examples/test-config.yaml \ + --provider=kind \ + --nodes=3 \ + --report-dir=/tmp/kwok-test + +# Check KWOK nodes are created +kubectl get nodes -l type=kwok + +# Verify ResourceSlices are available +kubectl get resourceslices + +# Check DeviceClass is installed +kubectl get deviceclasses cl2-gpu.kwok.x-k8s.io +``` + +### Debug Mode + +Enable verbose logging by setting environment variables: + +```bash +export KLOG_V=2 +./run-e2e.sh cluster-loader2 --testconfig=... --v=2 +``` + +## See Also + +- [KWOK Documentation](https://kwok.sigs.k8s.io/) +- [Kubernetes DRA Documentation](https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/) +- [ClusterLoader2 Documentation](../../../docs/) \ No newline at end of file diff --git a/clusterloader2/pkg/dependency/kwok/dra/kwok.go b/clusterloader2/pkg/dependency/kwok/dra/kwok.go new file mode 100644 index 0000000000..a84f765416 --- /dev/null +++ b/clusterloader2/pkg/dependency/kwok/dra/kwok.go @@ -0,0 +1,499 @@ +/* +Copyright 2025 The Kubernetes Authors. + +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +*/ + +package dra + +import ( + "context" + "embed" + "fmt" + "time" + + metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" + "k8s.io/apimachinery/pkg/util/wait" + "k8s.io/klog/v2" + + v1 "k8s.io/api/core/v1" + "k8s.io/apimachinery/pkg/api/errors" + "k8s.io/apimachinery/pkg/api/resource" + "k8s.io/apimachinery/pkg/apis/meta/v1/unstructured" + "k8s.io/apimachinery/pkg/runtime/schema" + "k8s.io/perf-tests/clusterloader2/pkg/dependency" + "k8s.io/perf-tests/clusterloader2/pkg/framework/client" + "k8s.io/perf-tests/clusterloader2/pkg/util" +) + +const ( + kwokDRADependencyName = "DRAKWOKDriver" + kwokNamespace = "kwok-system" + kwokControllerDeployment = "kwok-controller" + checkKwokReadyInterval = 15 * time.Second + defaultKwokReadyTimeout = 5 * time.Minute +) + +//go:embed manifests/*.yaml manifests/crds/*.yaml +var manifestsFS embed.FS + +func init() { + if err := dependency.Register(kwokDRADependencyName, createKWOKDRADependency); err != nil { + klog.Fatalf("Cannot register %s: %v", kwokDRADependencyName, err) + } +} + +func createKWOKDRADependency() dependency.Dependency { + return &kwokDRADependency{} +} + +type kwokDRADependency struct{} + +// Setup installs the KWOK controller and accompanying fake DRA resources. +func (d *kwokDRADependency) Setup(cfg *dependency.Config) error { + klog.V(2).Infof("%s: installing KWOK controller", d) + + if err := client.CreateNamespace(cfg.ClusterFramework.GetClientSets().GetClient(), kwokNamespace); err != nil { + return fmt.Errorf("namespace %s creation error: %v", kwokNamespace, err) + } + + coreManifests := []string{ + "manifests/service_account.yaml", + "manifests/role.yaml", + "manifests/role_binding.yaml", + "manifests/kwok.yaml", + "manifests/service.yaml", + "manifests/deployment.yaml", + "manifests/flow_schema.yaml", + "manifests/device-class.yaml", + } + + crdManifests := []string{ + "manifests/crds/kwok.x-k8s.io_attaches.yaml", + "manifests/crds/kwok.x-k8s.io_clusterattaches.yaml", + "manifests/crds/kwok.x-k8s.io_clusterexecs.yaml", + "manifests/crds/kwok.x-k8s.io_clusterlogs.yaml", + "manifests/crds/kwok.x-k8s.io_clusterportforwards.yaml", + "manifests/crds/kwok.x-k8s.io_clusterresourceusages.yaml", + "manifests/crds/kwok.x-k8s.io_execs.yaml", + "manifests/crds/kwok.x-k8s.io_logs.yaml", + "manifests/crds/kwok.x-k8s.io_metrics.yaml", + "manifests/crds/kwok.x-k8s.io_portforwards.yaml", + "manifests/crds/kwok.x-k8s.io_resourceusages.yaml", + "manifests/crds/kwok.x-k8s.io_stages.yaml", + } + + for _, manifest := range crdManifests { + klog.V(2).Infof("%s: applying CRD %s", d, manifest) + if err := cfg.ClusterFramework.ApplyTemplatedManifests( + manifestsFS, + manifest, + map[string]interface{}{}, + client.Retry(client.IsRetryableAPIError), + ); err != nil { + return fmt.Errorf("applying CRD %s error: %v", manifest, err) + } + } + + for _, manifest := range coreManifests { + klog.V(2).Infof("%s: applying %s", d, manifest) + if err := cfg.ClusterFramework.ApplyTemplatedManifests( + manifestsFS, + manifest, + map[string]interface{}{}, + client.Retry(client.IsRetryableAPIError), + ); err != nil { + return fmt.Errorf("applying %s error: %v", manifest, err) + } + klog.V(4).Infof("%s: successfully applied %s", d, manifest) + } + + timeout, err := dependencyWaitTimeout(cfg, defaultKwokReadyTimeout) + if err != nil { + return err + } + klog.V(2).Infof("%s: waiting up to %v for KWOK controller to be ready", d, timeout) + if err := d.waitForKWOKToBeHealthy(cfg, timeout); err != nil { + return err + } + + // Now apply Stage manifests (KWOK CRDs are now registered). + // Apply in order: node stages first (so nodes can become ready), then pod stages, then job stages + stageManifests := []string{ + "manifests/stage_fast_node_initialize.yaml", + "manifests/stage_fast_node.yaml", + "manifests/stage_fast_pod_ready.yaml", + "manifests/stage_fast_pod_delete.yaml", + "manifests/job-completion-stages.yaml", + } + + for _, stageManifest := range stageManifests { + klog.V(2).Infof("%s: applying Stage manifest %s", d, stageManifest) + if err := cfg.ClusterFramework.ApplyTemplatedManifests( + manifestsFS, + stageManifest, + map[string]interface{}{}, + client.Retry(client.IsRetryableAPIError), + ); err != nil { + return fmt.Errorf("applying Stage manifest %s error: %v", stageManifest, err) + } + } + + nodes, err := util.GetIntOrDefault(cfg.Params, "nodes", 2) + if err != nil { + return fmt.Errorf("invalid nodes param: %v", err) + } + gpusPerNode, err := util.GetIntOrDefault(cfg.Params, "gpusPerNode", 8) + if err != nil { + return fmt.Errorf("invalid gpusPerNode param: %v", err) + } + + if err := d.createFakeClusterObjects(cfg, nodes, gpusPerNode); err != nil { + return fmt.Errorf("creating fake cluster objects: %v", err) + } + + klog.V(2).Infof("%s: waiting for %d nodes to be ready", d, nodes) + if err := d.waitForNodesToBeReady(cfg, nodes, timeout); err != nil { + return fmt.Errorf("waiting for nodes to be ready: %v", err) + } + + klog.V(2).Infof("%s: KWOK controller along with dra devices installed successfully", d) + return nil +} + +func (d *kwokDRADependency) Teardown(cfg *dependency.Config) error { + klog.V(2).Infof("%s: tearing down KWOK controller and cluster resources", d) + + clientset := cfg.ClusterFramework.GetClientSets().GetClient() + dynamicClient := cfg.ClusterFramework.GetDynamicClients().GetClient() + + nodeList, err := clientset.CoreV1().Nodes().List(context.Background(), metav1.ListOptions{ + LabelSelector: "type=kwok", + }) + if err != nil { + klog.Warningf("%s: failed to list KWOK nodes: %v", d, err) + } else { + for _, node := range nodeList.Items { + if err := clientset.CoreV1().Nodes().Delete(context.Background(), node.Name, metav1.DeleteOptions{}); err != nil { + if !errors.IsNotFound(err) { + klog.Warningf("%s: failed to delete node %s: %v", d, node.Name, err) + } + } else { + klog.V(3).Infof("%s: deleted KWOK node %s", d, node.Name) + } + } + + // Give KWOK Stages time to simulate pod cleanup after node deletion + if len(nodeList.Items) > 0 { + klog.V(3).Infof("%s: waiting for KWOK to simulate pod cleanup after node deletion", d) + time.Sleep(5 * time.Second) + } + } + + stageGVR := schema.GroupVersionResource{Group: "kwok.x-k8s.io", Version: "v1alpha1", Resource: "stages"} + stageNames := []string{"job-complete-short", "job-complete-long", "node-heartbeat-with-lease", "node-initialize", "pod-delete", "pod-ready"} + for _, stageName := range stageNames { + if err := dynamicClient.Resource(stageGVR).Delete(context.Background(), stageName, metav1.DeleteOptions{}); err != nil { + if !errors.IsNotFound(err) { + klog.Warningf("%s: failed to delete Stage %s: %v", d, stageName, err) + } + } else { + klog.V(3).Infof("%s: deleted Stage %s", d, stageName) + } + } + + resourceSliceGVR := schema.GroupVersionResource{Group: "resource.k8s.io", Version: "v1beta2", Resource: "resourceslices"} + sliceList, err := dynamicClient.Resource(resourceSliceGVR).List(context.Background(), metav1.ListOptions{ + LabelSelector: "resource.k8s.io/driver=cl2-gpu.kwok.x-k8s.io", + }) + if err != nil { + klog.Warningf("%s: failed to list ResourceSlices: %v", d, err) + } else { + for _, slice := range sliceList.Items { + sliceName := slice.GetName() + if err := dynamicClient.Resource(resourceSliceGVR).Delete(context.Background(), sliceName, metav1.DeleteOptions{}); err != nil { + if !errors.IsNotFound(err) { + klog.Warningf("%s: failed to delete ResourceSlice %s: %v", d, sliceName, err) + } + } else { + klog.V(3).Infof("%s: deleted ResourceSlice %s", d, sliceName) + } + } + } + + deviceClassGVR := schema.GroupVersionResource{Group: "resource.k8s.io", Version: "v1beta2", Resource: "deviceclasses"} + deviceClassName := "cl2-gpu.kwok.x-k8s.io" + if err := dynamicClient.Resource(deviceClassGVR).Delete(context.Background(), deviceClassName, metav1.DeleteOptions{}); err != nil { + if !errors.IsNotFound(err) { + klog.Warningf("%s: failed to delete DeviceClass %s: %v", d, deviceClassName, err) + } + } else { + klog.V(3).Infof("%s: deleted DeviceClass %s", d, deviceClassName) + } + + if err := client.DeleteNamespace(clientset, kwokNamespace); err != nil { + return fmt.Errorf("deleting %s namespace error: %v", kwokNamespace, err) + } + if err := client.WaitForDeleteNamespace(clientset, kwokNamespace, client.DefaultNamespaceDeletionTimeout); err != nil { + return err + } + + klog.V(2).Infof("%s: KWOK controller and all cluster resources removed", d) + return nil +} + +func (d *kwokDRADependency) waitForKWOKToBeHealthy(cfg *dependency.Config, timeout time.Duration) error { + return wait.PollUntilContextTimeout(context.TODO(), checkKwokReadyInterval, timeout, true, func(ctx context.Context) (bool, error) { + return d.isKWOKReady(ctx, cfg) + }) +} + +func (d *kwokDRADependency) isKWOKReady(ctx context.Context, cfg *dependency.Config) (bool, error) { + deploy, err := cfg.ClusterFramework.GetClientSets().GetClient().AppsV1().Deployments(kwokNamespace).Get(ctx, kwokControllerDeployment, metav1.GetOptions{}) + if err != nil { + if errors.IsNotFound(err) { + klog.V(4).Infof("KWOK deployment not found yet, continuing to wait...") + return false, nil // Not ready yet, but not an error - continue polling + } + return false, fmt.Errorf("failed to get KWOK deployment: %v", err) + } + ready := deploy.Status.ReadyReplicas == *deploy.Spec.Replicas && deploy.Status.ReadyReplicas > 0 + if !ready { + klog.V(4).Infof("KWOK controller not ready: replicas %d/%d", deploy.Status.ReadyReplicas, *deploy.Spec.Replicas) + } + return ready, nil +} + +func (d *kwokDRADependency) waitForNodesToBeReady(cfg *dependency.Config, expectedNodes int, timeout time.Duration) error { + return wait.PollUntilContextTimeout(context.TODO(), checkKwokReadyInterval, timeout, true, func(ctx context.Context) (bool, error) { + return d.areNodesReady(ctx, cfg, expectedNodes) + }) +} + +func (d *kwokDRADependency) areNodesReady(ctx context.Context, cfg *dependency.Config, expectedNodes int) (bool, error) { + clientset := cfg.ClusterFramework.GetClientSets().GetClient() + + nodeList, err := clientset.CoreV1().Nodes().List(ctx, metav1.ListOptions{ + LabelSelector: "type=kwok", + }) + if err != nil { + return false, fmt.Errorf("failed to list KWOK nodes: %v", err) + } + + if len(nodeList.Items) != expectedNodes { + klog.V(4).Infof("Expected %d KWOK nodes, found %d", expectedNodes, len(nodeList.Items)) + return false, nil + } + + readyNodes := 0 + for _, node := range nodeList.Items { + for _, condition := range node.Status.Conditions { + if condition.Type == v1.NodeReady && condition.Status == v1.ConditionTrue { + readyNodes++ + break + } + } + } + + if readyNodes != expectedNodes { + klog.V(4).Infof("Expected %d ready KWOK nodes, found %d ready", expectedNodes, readyNodes) + return false, nil + } + + klog.V(3).Infof("All %d KWOK nodes are ready", expectedNodes) + return true, nil +} + +// createFakeClusterObjects creates Node and matching ResourceSlice objects for KWOK. +func (d *kwokDRADependency) createFakeClusterObjects(cfg *dependency.Config, nodeCount, gpusPerNode int) error { + clientset := cfg.ClusterFramework.GetClientSets().GetClient() + + for i := 0; i < nodeCount; i++ { + nodeName := fmt.Sprintf("kwok-node-%d", i) + + // Prepare Node object + node := &v1.Node{ + ObjectMeta: metav1.ObjectMeta{ + Name: nodeName, + Labels: map[string]string{ + "beta.kubernetes.io/arch": "amd64", + "beta.kubernetes.io/os": "linux", + "kubernetes.io/arch": "amd64", + "kubernetes.io/os": "linux", + "kubernetes.io/role": "agent", + "node-role.kubernetes.io/agent": "", + "type": "kwok", + "kubernetes.io/hostname": nodeName, + }, + Annotations: map[string]string{ + "node.alpha.kubernetes.io/ttl": "0", + "kwok.x-k8s.io/node": "fake", + }, + }, + Spec: v1.NodeSpec{ + Taints: []v1.Taint{{ + Key: "kwok.x-k8s.io/node", + Value: "fake", + Effect: v1.TaintEffectNoSchedule, + }}, + }, + Status: v1.NodeStatus{ + Capacity: v1.ResourceList{ + v1.ResourceCPU: resource.MustParse("32"), + v1.ResourceMemory: resource.MustParse("256Gi"), + v1.ResourcePods: resource.MustParse("110"), + }, + Allocatable: v1.ResourceList{ + v1.ResourceCPU: resource.MustParse("32"), + v1.ResourceMemory: resource.MustParse("256Gi"), + v1.ResourcePods: resource.MustParse("110"), + }, + Conditions: []v1.NodeCondition{ + { + Type: v1.NodeReady, + Status: v1.ConditionTrue, + LastHeartbeatTime: metav1.Now(), + LastTransitionTime: metav1.Now(), + Reason: "KubeletReady", + Message: "kubelet is posting ready status. AppArmor enabled", + }, + { + Type: v1.NodeMemoryPressure, + Status: v1.ConditionFalse, + LastHeartbeatTime: metav1.Now(), + LastTransitionTime: metav1.Now(), + Reason: "KubeletHasSufficientMemory", + Message: "kubelet has sufficient memory available", + }, + { + Type: v1.NodeDiskPressure, + Status: v1.ConditionFalse, + LastHeartbeatTime: metav1.Now(), + LastTransitionTime: metav1.Now(), + Reason: "KubeletHasNoDiskPressure", + Message: "kubelet has no disk pressure", + }, + { + Type: v1.NodePIDPressure, + Status: v1.ConditionFalse, + LastHeartbeatTime: metav1.Now(), + LastTransitionTime: metav1.Now(), + Reason: "KubeletHasSufficientPID", + Message: "kubelet has sufficient PID available", + }, + }, + Phase: v1.NodeRunning, + NodeInfo: v1.NodeSystemInfo{ + MachineID: "fake-machine-id", + SystemUUID: "fake-system-uuid", + BootID: "fake-boot-id", + KernelVersion: "5.4.0-fake", + OSImage: "Ubuntu 20.04.1 LTS", + ContainerRuntimeVersion: "containerd://1.6.0-fake", + KubeletVersion: "v1.29.0-fake", + KubeProxyVersion: "v1.29.0-fake", + OperatingSystem: "linux", + Architecture: "amd64", + }, + }, + } + + // Create or update Node + if _, err := clientset.CoreV1().Nodes().Create(context.Background(), node, metav1.CreateOptions{}); err != nil { + if !errors.IsAlreadyExists(err) { + return fmt.Errorf("creating node %s: %v", nodeName, err) + } + } + + // Build ResourceSlice unstructured (because typed client may not exist) + sliceName := fmt.Sprintf("kwok-gpu-node-%d", i) + gvr := schema.GroupVersionResource{Group: "resource.k8s.io", Version: "v1beta2", Resource: "resourceslices"} + + deviceList := make([]interface{}, 0, gpusPerNode) + for g := 0; g < gpusPerNode; g++ { + deviceList = append(deviceList, map[string]interface{}{ + "name": fmt.Sprintf("gpu%d", g), + "attributes": map[string]interface{}{ + "name": map[string]interface{}{ + "string": fmt.Sprintf("gpu_%d", g), + }, + "gpu_type": map[string]interface{}{ + "string": "kwok_gpu", + }, + "memory": map[string]interface{}{ + "string": "8Gi", + }, + "compute_capability": map[string]interface{}{ + "string": "7.5", + }, + }, + }) + } + + sliceObj := &unstructured.Unstructured{Object: map[string]interface{}{ + "apiVersion": "resource.k8s.io/v1beta2", + "kind": "ResourceSlice", + "metadata": map[string]interface{}{ + "name": sliceName, + "labels": map[string]interface{}{ + "resource.k8s.io/driver": "cl2-gpu.kwok.x-k8s.io", + }, + }, + "spec": map[string]interface{}{ + "driver": "cl2-gpu.kwok.x-k8s.io", + "nodeName": nodeName, + "pool": map[string]interface{}{ + "name": fmt.Sprintf("kwok-gpu-pool-node-%d", i), + "generation": int64(1), + "resourceSliceCount": int64(1), + }, + "devices": deviceList, + }, + }} + + dynamicClient := cfg.ClusterFramework.GetDynamicClients().GetClient() + if _, err := dynamicClient.Resource(gvr).Create(context.Background(), sliceObj, metav1.CreateOptions{}); err != nil { + if !errors.IsAlreadyExists(err) { + return fmt.Errorf("creating resourceslice %s: %v", sliceName, err) + } + } + } + return nil +} + +func dependencyWaitTimeout(cfg *dependency.Config, def time.Duration) (time.Duration, error) { + // look for "timeout" param in dependency params map + if cfg == nil || cfg.Params == nil { + return def, nil + } + return dependencyTimeoutFromParams(cfg.Params, def) +} + +func dependencyTimeoutFromParams(params map[string]interface{}, def time.Duration) (time.Duration, error) { + // util.GetDurationOrDefault cannot be used here to avoid import cycle; mimic. + if raw, ok := params["timeout"]; ok { + switch v := raw.(type) { + case time.Duration: + return v, nil + case string: + d, err := time.ParseDuration(v) + if err != nil { + return 0, fmt.Errorf("parsing timeout param: %v", err) + } + return d, nil + } + } + return def, nil +} + +func (d *kwokDRADependency) String() string { return kwokDRADependencyName } diff --git a/clusterloader2/pkg/dependency/kwok/dra/manifests/crds/kwok.x-k8s.io_attaches.yaml b/clusterloader2/pkg/dependency/kwok/dra/manifests/crds/kwok.x-k8s.io_attaches.yaml new file mode 100644 index 0000000000..e3066ad68e --- /dev/null +++ b/clusterloader2/pkg/dependency/kwok/dra/manifests/crds/kwok.x-k8s.io_attaches.yaml @@ -0,0 +1,124 @@ +--- +apiVersion: apiextensions.k8s.io/v1 +kind: CustomResourceDefinition +metadata: + annotations: + controller-gen.kubebuilder.io/version: v0.16.1 + name: attaches.kwok.x-k8s.io +spec: + group: kwok.x-k8s.io + names: + kind: Attach + listKind: AttachList + plural: attaches + singular: attach + scope: Namespaced + versions: + - name: v1alpha1 + schema: + openAPIV3Schema: + description: Attach provides attach configuration for a single pod. + properties: + apiVersion: + description: |- + APIVersion defines the versioned schema of this representation of an object. + Servers should convert recognized schemas to the latest internal value, and + may reject unrecognized values. + More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources + type: string + kind: + description: |- + Kind is a string value representing the REST resource this object represents. + Servers may infer this from the endpoint the client submits requests to. + Cannot be updated. + In CamelCase. + More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds + type: string + metadata: + type: object + spec: + description: Spec holds spec for attach + properties: + attaches: + description: Attaches is a list of attaches to configure. + items: + description: AttachConfig holds information how to attach. + properties: + containers: + description: Containers is list of container names. + items: + type: string + type: array + logsFile: + description: LogsFile is the file from which the attach starts + type: string + type: object + type: array + required: + - attaches + type: object + status: + description: Status holds status for attach + properties: + conditions: + description: Conditions holds conditions for attach + items: + description: Condition contains details for one aspect of the current + state of this API Resource. + properties: + lastTransitionTime: + description: |- + LastTransitionTime is the last time the condition transitioned from one status to another. + This should be when the underlying condition changed. If that is not known, then using the time when the API field changed is acceptable. + format: date-time + type: string + message: + description: |- + Message is a human readable message indicating details about the transition. + This may be an empty string. + maxLength: 32768 + type: string + reason: + description: |- + Reason contains a programmatic identifier indicating the reason for the condition's last transition. + Producers of specific condition types may define expected values and meanings for this field, + and whether the values are considered a guaranteed API. + The value should be a CamelCase string. + This field may not be empty. + maxLength: 1024 + minLength: 1 + pattern: ^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$ + type: string + status: + description: Status of the condition + type: string + type: + description: |- + Type of condition in CamelCase or in foo.example.com/CamelCase. + Many .condition.type values are consistent across resources like Available, but because arbitrary conditions can be + useful (see .node.status.conditions), the ability to deconflict is important. + The regex it matches is (dns1123SubdomainFmt/)?(qualifiedNameFmt) + maxLength: 316 + pattern: ^([a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*/)?(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])$ + type: string + required: + - lastTransitionTime + - message + - reason + - status + - type + type: object + type: array + x-kubernetes-list-map-keys: + - type + x-kubernetes-list-type: map + type: object + required: + - metadata + - spec + type: object + served: true + storage: true + subresources: + status: {} + diff --git a/clusterloader2/pkg/dependency/kwok/dra/manifests/crds/kwok.x-k8s.io_clusterattaches.yaml b/clusterloader2/pkg/dependency/kwok/dra/manifests/crds/kwok.x-k8s.io_clusterattaches.yaml new file mode 100644 index 0000000000..984db9d5cb --- /dev/null +++ b/clusterloader2/pkg/dependency/kwok/dra/manifests/crds/kwok.x-k8s.io_clusterattaches.yaml @@ -0,0 +1,139 @@ +--- +apiVersion: apiextensions.k8s.io/v1 +kind: CustomResourceDefinition +metadata: + annotations: + controller-gen.kubebuilder.io/version: v0.16.1 + name: clusterattaches.kwok.x-k8s.io +spec: + group: kwok.x-k8s.io + names: + kind: ClusterAttach + listKind: ClusterAttachList + plural: clusterattaches + singular: clusterattach + scope: Cluster + versions: + - name: v1alpha1 + schema: + openAPIV3Schema: + description: ClusterAttach provides cluster-wide logging configuration + properties: + apiVersion: + description: |- + APIVersion defines the versioned schema of this representation of an object. + Servers should convert recognized schemas to the latest internal value, and + may reject unrecognized values. + More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources + type: string + kind: + description: |- + Kind is a string value representing the REST resource this object represents. + Servers may infer this from the endpoint the client submits requests to. + Cannot be updated. + In CamelCase. + More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds + type: string + metadata: + type: object + spec: + description: Spec holds spec for cluster attach. + properties: + attaches: + description: Attaches is a list of attach configurations. + items: + description: AttachConfig holds information how to attach. + properties: + containers: + description: Containers is list of container names. + items: + type: string + type: array + logsFile: + description: LogsFile is the file from which the attach starts + type: string + type: object + type: array + selector: + description: Selector is a selector to filter pods to configure. + properties: + matchNames: + description: |- + MatchNames is a list of names to match. + if not set, all names will be matched. + items: + type: string + type: array + matchNamespaces: + description: |- + MatchNamespaces is a list of namespaces to match. + if not set, all namespaces will be matched. + items: + type: string + type: array + type: object + required: + - attaches + type: object + status: + description: Status holds status for cluster attach + properties: + conditions: + description: Conditions holds conditions for cluster attach. + items: + description: Condition contains details for one aspect of the current + state of this API Resource. + properties: + lastTransitionTime: + description: |- + LastTransitionTime is the last time the condition transitioned from one status to another. + This should be when the underlying condition changed. If that is not known, then using the time when the API field changed is acceptable. + format: date-time + type: string + message: + description: |- + Message is a human readable message indicating details about the transition. + This may be an empty string. + maxLength: 32768 + type: string + reason: + description: |- + Reason contains a programmatic identifier indicating the reason for the condition's last transition. + Producers of specific condition types may define expected values and meanings for this field, + and whether the values are considered a guaranteed API. + The value should be a CamelCase string. + This field may not be empty. + maxLength: 1024 + minLength: 1 + pattern: ^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$ + type: string + status: + description: Status of the condition + type: string + type: + description: |- + Type of condition in CamelCase or in foo.example.com/CamelCase. + Many .condition.type values are consistent across resources like Available, but because arbitrary conditions can be + useful (see .node.status.conditions), the ability to deconflict is important. + The regex it matches is (dns1123SubdomainFmt/)?(qualifiedNameFmt) + maxLength: 316 + pattern: ^([a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*/)?(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])$ + type: string + required: + - lastTransitionTime + - message + - reason + - status + - type + type: object + type: array + type: object + required: + - metadata + - spec + type: object + served: true + storage: true + subresources: + status: {} + diff --git a/clusterloader2/pkg/dependency/kwok/dra/manifests/crds/kwok.x-k8s.io_clusterexecs.yaml b/clusterloader2/pkg/dependency/kwok/dra/manifests/crds/kwok.x-k8s.io_clusterexecs.yaml new file mode 100644 index 0000000000..902d3024c5 --- /dev/null +++ b/clusterloader2/pkg/dependency/kwok/dra/manifests/crds/kwok.x-k8s.io_clusterexecs.yaml @@ -0,0 +1,181 @@ +--- +apiVersion: apiextensions.k8s.io/v1 +kind: CustomResourceDefinition +metadata: + annotations: + controller-gen.kubebuilder.io/version: v0.16.1 + name: clusterexecs.kwok.x-k8s.io +spec: + group: kwok.x-k8s.io + names: + kind: ClusterExec + listKind: ClusterExecList + plural: clusterexecs + singular: clusterexec + scope: Cluster + versions: + - name: v1alpha1 + schema: + openAPIV3Schema: + description: ClusterExec provides cluster-wide exec configuration. + properties: + apiVersion: + description: |- + APIVersion defines the versioned schema of this representation of an object. + Servers should convert recognized schemas to the latest internal value, and + may reject unrecognized values. + More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources + type: string + kind: + description: |- + Kind is a string value representing the REST resource this object represents. + Servers may infer this from the endpoint the client submits requests to. + Cannot be updated. + In CamelCase. + More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds + type: string + metadata: + type: object + spec: + description: Spec holds spec for cluster exec. + properties: + execs: + description: Execs is a list of exec to configure. + items: + description: ExecTarget holds information how to exec. + properties: + containers: + description: |- + Containers is a list of containers to exec. + if not set, all containers will be execed. + items: + type: string + type: array + local: + description: Local holds information how to exec to a local + target. + properties: + envs: + description: Envs is a list of environment variables to + exec with. + items: + description: EnvVar represents an environment variable + present in a Container. + properties: + name: + description: Name of the environment variable. + minLength: 1 + type: string + value: + description: Value of the environment variable. + type: string + required: + - name + type: object + type: array + securityContext: + description: SecurityContext is the user context to exec. + properties: + runAsGroup: + description: RunAsGroup is the existing gid to run exec + command in container process. + format: int64 + type: integer + runAsUser: + description: RunAsUser is the existing uid to run exec + command in container process. + format: int64 + type: integer + type: object + workDir: + description: WorkDir is the working directory to exec with. + type: string + type: object + type: object + type: array + selector: + description: Selector is a selector to filter pods to configure. + properties: + matchNames: + description: |- + MatchNames is a list of names to match. + if not set, all names will be matched. + items: + type: string + type: array + matchNamespaces: + description: |- + MatchNamespaces is a list of namespaces to match. + if not set, all namespaces will be matched. + items: + type: string + type: array + type: object + required: + - execs + type: object + status: + description: Status holds status for cluster exec + properties: + conditions: + description: Conditions holds conditions for cluster exec. + items: + description: Condition contains details for one aspect of the current + state of this API Resource. + properties: + lastTransitionTime: + description: |- + LastTransitionTime is the last time the condition transitioned from one status to another. + This should be when the underlying condition changed. If that is not known, then using the time when the API field changed is acceptable. + format: date-time + type: string + message: + description: |- + Message is a human readable message indicating details about the transition. + This may be an empty string. + maxLength: 32768 + type: string + reason: + description: |- + Reason contains a programmatic identifier indicating the reason for the condition's last transition. + Producers of specific condition types may define expected values and meanings for this field, + and whether the values are considered a guaranteed API. + The value should be a CamelCase string. + This field may not be empty. + maxLength: 1024 + minLength: 1 + pattern: ^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$ + type: string + status: + description: Status of the condition + type: string + type: + description: |- + Type of condition in CamelCase or in foo.example.com/CamelCase. + Many .condition.type values are consistent across resources like Available, but because arbitrary conditions can be + useful (see .node.status.conditions), the ability to deconflict is important. + The regex it matches is (dns1123SubdomainFmt/)?(qualifiedNameFmt) + maxLength: 316 + pattern: ^([a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*/)?(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])$ + type: string + required: + - lastTransitionTime + - message + - reason + - status + - type + type: object + type: array + x-kubernetes-list-map-keys: + - type + x-kubernetes-list-type: map + type: object + required: + - metadata + - spec + type: object + served: true + storage: true + subresources: + status: {} + diff --git a/clusterloader2/pkg/dependency/kwok/dra/manifests/crds/kwok.x-k8s.io_clusterlogs.yaml b/clusterloader2/pkg/dependency/kwok/dra/manifests/crds/kwok.x-k8s.io_clusterlogs.yaml new file mode 100644 index 0000000000..07be1e43db --- /dev/null +++ b/clusterloader2/pkg/dependency/kwok/dra/manifests/crds/kwok.x-k8s.io_clusterlogs.yaml @@ -0,0 +1,150 @@ +--- +apiVersion: apiextensions.k8s.io/v1 +kind: CustomResourceDefinition +metadata: + annotations: + controller-gen.kubebuilder.io/version: v0.16.1 + name: clusterlogs.kwok.x-k8s.io +spec: + group: kwok.x-k8s.io + names: + kind: ClusterLogs + listKind: ClusterLogsList + plural: clusterlogs + singular: clusterlogs + scope: Cluster + versions: + - name: v1alpha1 + schema: + openAPIV3Schema: + description: ClusterLogs provides cluster-wide logging configuration + properties: + apiVersion: + description: |- + APIVersion defines the versioned schema of this representation of an object. + Servers should convert recognized schemas to the latest internal value, and + may reject unrecognized values. + More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources + type: string + kind: + description: |- + Kind is a string value representing the REST resource this object represents. + Servers may infer this from the endpoint the client submits requests to. + Cannot be updated. + In CamelCase. + More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds + type: string + metadata: + type: object + spec: + description: Spec holds spec for cluster logs. + properties: + logs: + description: Forwards is a list of log configurations. + items: + description: Log holds information how to forward logs. + properties: + containers: + description: Containers is list of container names. + items: + type: string + type: array + follow: + description: Follow up if true + type: boolean + logsFile: + description: LogsFile is the file from which the log forward + starts + type: string + previousLogsFile: + description: PreviousLogsFile is the file containing previous + container logs + type: string + type: object + type: array + selector: + description: Selector is a selector to filter pods to configure. + properties: + matchNames: + description: |- + MatchNames is a list of names to match. + if not set, all names will be matched. + items: + type: string + type: array + matchNamespaces: + description: |- + MatchNamespaces is a list of namespaces to match. + if not set, all namespaces will be matched. + items: + type: string + type: array + type: object + required: + - logs + type: object + status: + description: Status holds status for cluster logs + properties: + conditions: + description: Conditions holds conditions for cluster logs. + items: + description: Condition contains details for one aspect of the current + state of this API Resource. + properties: + lastTransitionTime: + description: |- + LastTransitionTime is the last time the condition transitioned from one status to another. + This should be when the underlying condition changed. If that is not known, then using the time when the API field changed is acceptable. + format: date-time + type: string + message: + description: |- + Message is a human readable message indicating details about the transition. + This may be an empty string. + maxLength: 32768 + type: string + reason: + description: |- + Reason contains a programmatic identifier indicating the reason for the condition's last transition. + Producers of specific condition types may define expected values and meanings for this field, + and whether the values are considered a guaranteed API. + The value should be a CamelCase string. + This field may not be empty. + maxLength: 1024 + minLength: 1 + pattern: ^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$ + type: string + status: + description: Status of the condition + type: string + type: + description: |- + Type of condition in CamelCase or in foo.example.com/CamelCase. + Many .condition.type values are consistent across resources like Available, but because arbitrary conditions can be + useful (see .node.status.conditions), the ability to deconflict is important. + The regex it matches is (dns1123SubdomainFmt/)?(qualifiedNameFmt) + maxLength: 316 + pattern: ^([a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*/)?(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])$ + type: string + required: + - lastTransitionTime + - message + - reason + - status + - type + type: object + type: array + x-kubernetes-list-map-keys: + - type + x-kubernetes-list-type: map + type: object + required: + - metadata + - spec + type: object + served: true + storage: true + subresources: + status: {} + diff --git a/clusterloader2/pkg/dependency/kwok/dra/manifests/crds/kwok.x-k8s.io_clusterportforwards.yaml b/clusterloader2/pkg/dependency/kwok/dra/manifests/crds/kwok.x-k8s.io_clusterportforwards.yaml new file mode 100644 index 0000000000..4fb4917c12 --- /dev/null +++ b/clusterloader2/pkg/dependency/kwok/dra/manifests/crds/kwok.x-k8s.io_clusterportforwards.yaml @@ -0,0 +1,166 @@ +--- +apiVersion: apiextensions.k8s.io/v1 +kind: CustomResourceDefinition +metadata: + annotations: + controller-gen.kubebuilder.io/version: v0.16.1 + name: clusterportforwards.kwok.x-k8s.io +spec: + group: kwok.x-k8s.io + names: + kind: ClusterPortForward + listKind: ClusterPortForwardList + plural: clusterportforwards + singular: clusterportforward + scope: Cluster + versions: + - name: v1alpha1 + schema: + openAPIV3Schema: + description: ClusterPortForward provides cluster-wide port forward configuration. + properties: + apiVersion: + description: |- + APIVersion defines the versioned schema of this representation of an object. + Servers should convert recognized schemas to the latest internal value, and + may reject unrecognized values. + More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources + type: string + kind: + description: |- + Kind is a string value representing the REST resource this object represents. + Servers may infer this from the endpoint the client submits requests to. + Cannot be updated. + In CamelCase. + More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds + type: string + metadata: + type: object + spec: + description: Spec holds spec for cluster port forward. + properties: + forwards: + description: Forwards is a list of forwards to configure. + items: + description: Forward holds information how to forward based on ports. + properties: + command: + description: |- + Command is the command to run to forward with stdin/stdout. + if set, Target will be ignored. + items: + type: string + type: array + ports: + description: |- + Ports is a list of ports to forward. + if not set, all ports will be forwarded. + items: + format: int32 + type: integer + type: array + target: + description: Target is the target to forward to. + properties: + address: + description: Address is the address to forward to. + minLength: 1 + type: string + port: + description: Port is the port to forward to. + format: int32 + maximum: 65535 + minimum: 0 + type: integer + required: + - address + - port + type: object + type: object + type: array + selector: + description: Selector is a selector to filter pods to configure. + properties: + matchNames: + description: |- + MatchNames is a list of names to match. + if not set, all names will be matched. + items: + type: string + type: array + matchNamespaces: + description: |- + MatchNamespaces is a list of namespaces to match. + if not set, all namespaces will be matched. + items: + type: string + type: array + type: object + required: + - forwards + type: object + status: + description: Status holds status for cluster port forward + properties: + conditions: + description: Conditions holds conditions for cluster port forward. + items: + description: Condition contains details for one aspect of the current + state of this API Resource. + properties: + lastTransitionTime: + description: |- + LastTransitionTime is the last time the condition transitioned from one status to another. + This should be when the underlying condition changed. If that is not known, then using the time when the API field changed is acceptable. + format: date-time + type: string + message: + description: |- + Message is a human readable message indicating details about the transition. + This may be an empty string. + maxLength: 32768 + type: string + reason: + description: |- + Reason contains a programmatic identifier indicating the reason for the condition's last transition. + Producers of specific condition types may define expected values and meanings for this field, + and whether the values are considered a guaranteed API. + The value should be a CamelCase string. + This field may not be empty. + maxLength: 1024 + minLength: 1 + pattern: ^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$ + type: string + status: + description: Status of the condition + type: string + type: + description: |- + Type of condition in CamelCase or in foo.example.com/CamelCase. + Many .condition.type values are consistent across resources like Available, but because arbitrary conditions can be + useful (see .node.status.conditions), the ability to deconflict is important. + The regex it matches is (dns1123SubdomainFmt/)?(qualifiedNameFmt) + maxLength: 316 + pattern: ^([a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*/)?(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])$ + type: string + required: + - lastTransitionTime + - message + - reason + - status + - type + type: object + type: array + x-kubernetes-list-map-keys: + - type + x-kubernetes-list-type: map + type: object + required: + - metadata + - spec + type: object + served: true + storage: true + subresources: + status: {} + diff --git a/clusterloader2/pkg/dependency/kwok/dra/manifests/crds/kwok.x-k8s.io_clusterresourceusages.yaml b/clusterloader2/pkg/dependency/kwok/dra/manifests/crds/kwok.x-k8s.io_clusterresourceusages.yaml new file mode 100644 index 0000000000..77a7f60cf0 --- /dev/null +++ b/clusterloader2/pkg/dependency/kwok/dra/manifests/crds/kwok.x-k8s.io_clusterresourceusages.yaml @@ -0,0 +1,156 @@ +--- +apiVersion: apiextensions.k8s.io/v1 +kind: CustomResourceDefinition +metadata: + annotations: + controller-gen.kubebuilder.io/version: v0.16.1 + name: clusterresourceusages.kwok.x-k8s.io +spec: + group: kwok.x-k8s.io + names: + kind: ClusterResourceUsage + listKind: ClusterResourceUsageList + plural: clusterresourceusages + singular: clusterresourceusage + scope: Cluster + versions: + - name: v1alpha1 + schema: + openAPIV3Schema: + description: ClusterResourceUsage provides cluster-wide resource usage. + properties: + apiVersion: + description: |- + APIVersion defines the versioned schema of this representation of an object. + Servers should convert recognized schemas to the latest internal value, and + may reject unrecognized values. + More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources + type: string + kind: + description: |- + Kind is a string value representing the REST resource this object represents. + Servers may infer this from the endpoint the client submits requests to. + Cannot be updated. + In CamelCase. + More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds + type: string + metadata: + type: object + spec: + description: Spec holds spec for cluster resource usage. + properties: + selector: + description: Selector is a selector to filter pods to configure. + properties: + matchNames: + description: |- + MatchNames is a list of names to match. + if not set, all names will be matched. + items: + type: string + type: array + matchNamespaces: + description: |- + MatchNamespaces is a list of namespaces to match. + if not set, all namespaces will be matched. + items: + type: string + type: array + type: object + usages: + description: Usages is a list of resource usage for the pod. + items: + description: ResourceUsageContainer holds spec for resource usage + container. + properties: + containers: + description: Containers is list of container names. + items: + type: string + type: array + usage: + additionalProperties: + description: ResourceUsageValue holds value for resource usage. + properties: + expression: + description: Expression is the expression for resource + usage. + type: string + value: + anyOf: + - type: integer + - type: string + description: Value is the value for resource usage. + pattern: ^(\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))(([KMGTPE]i)|[numkMGTPE]|([eE](\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))))?$ + x-kubernetes-int-or-string: true + type: object + description: Usage is a list of resource usage for the container. + type: object + type: object + type: array + type: object + status: + description: Status holds status for cluster resource usage + properties: + conditions: + description: Conditions holds conditions for cluster resource usage + items: + description: Condition contains details for one aspect of the current + state of this API Resource. + properties: + lastTransitionTime: + description: |- + LastTransitionTime is the last time the condition transitioned from one status to another. + This should be when the underlying condition changed. If that is not known, then using the time when the API field changed is acceptable. + format: date-time + type: string + message: + description: |- + Message is a human readable message indicating details about the transition. + This may be an empty string. + maxLength: 32768 + type: string + reason: + description: |- + Reason contains a programmatic identifier indicating the reason for the condition's last transition. + Producers of specific condition types may define expected values and meanings for this field, + and whether the values are considered a guaranteed API. + The value should be a CamelCase string. + This field may not be empty. + maxLength: 1024 + minLength: 1 + pattern: ^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$ + type: string + status: + description: Status of the condition + type: string + type: + description: |- + Type of condition in CamelCase or in foo.example.com/CamelCase. + Many .condition.type values are consistent across resources like Available, but because arbitrary conditions can be + useful (see .node.status.conditions), the ability to deconflict is important. + The regex it matches is (dns1123SubdomainFmt/)?(qualifiedNameFmt) + maxLength: 316 + pattern: ^([a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*/)?(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])$ + type: string + required: + - lastTransitionTime + - message + - reason + - status + - type + type: object + type: array + x-kubernetes-list-map-keys: + - type + x-kubernetes-list-type: map + type: object + required: + - metadata + - spec + type: object + served: true + storage: true + subresources: + status: {} + diff --git a/clusterloader2/pkg/dependency/kwok/dra/manifests/crds/kwok.x-k8s.io_execs.yaml b/clusterloader2/pkg/dependency/kwok/dra/manifests/crds/kwok.x-k8s.io_execs.yaml new file mode 100644 index 0000000000..e61c84b867 --- /dev/null +++ b/clusterloader2/pkg/dependency/kwok/dra/manifests/crds/kwok.x-k8s.io_execs.yaml @@ -0,0 +1,163 @@ +--- +apiVersion: apiextensions.k8s.io/v1 +kind: CustomResourceDefinition +metadata: + annotations: + controller-gen.kubebuilder.io/version: v0.16.1 + name: execs.kwok.x-k8s.io +spec: + group: kwok.x-k8s.io + names: + kind: Exec + listKind: ExecList + plural: execs + singular: exec + scope: Namespaced + versions: + - name: v1alpha1 + schema: + openAPIV3Schema: + description: Exec provides exec configuration for a single pod. + properties: + apiVersion: + description: |- + APIVersion defines the versioned schema of this representation of an object. + Servers should convert recognized schemas to the latest internal value, and + may reject unrecognized values. + More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources + type: string + kind: + description: |- + Kind is a string value representing the REST resource this object represents. + Servers may infer this from the endpoint the client submits requests to. + Cannot be updated. + In CamelCase. + More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds + type: string + metadata: + type: object + spec: + description: Spec holds spec for exec + properties: + execs: + description: Execs is a list of execs to configure. + items: + description: ExecTarget holds information how to exec. + properties: + containers: + description: |- + Containers is a list of containers to exec. + if not set, all containers will be execed. + items: + type: string + type: array + local: + description: Local holds information how to exec to a local + target. + properties: + envs: + description: Envs is a list of environment variables to + exec with. + items: + description: EnvVar represents an environment variable + present in a Container. + properties: + name: + description: Name of the environment variable. + minLength: 1 + type: string + value: + description: Value of the environment variable. + type: string + required: + - name + type: object + type: array + securityContext: + description: SecurityContext is the user context to exec. + properties: + runAsGroup: + description: RunAsGroup is the existing gid to run exec + command in container process. + format: int64 + type: integer + runAsUser: + description: RunAsUser is the existing uid to run exec + command in container process. + format: int64 + type: integer + type: object + workDir: + description: WorkDir is the working directory to exec with. + type: string + type: object + type: object + type: array + required: + - execs + type: object + status: + description: Status holds status for exec + properties: + conditions: + description: Conditions holds conditions for exec + items: + description: Condition contains details for one aspect of the current + state of this API Resource. + properties: + lastTransitionTime: + description: |- + LastTransitionTime is the last time the condition transitioned from one status to another. + This should be when the underlying condition changed. If that is not known, then using the time when the API field changed is acceptable. + format: date-time + type: string + message: + description: |- + Message is a human readable message indicating details about the transition. + This may be an empty string. + maxLength: 32768 + type: string + reason: + description: |- + Reason contains a programmatic identifier indicating the reason for the condition's last transition. + Producers of specific condition types may define expected values and meanings for this field, + and whether the values are considered a guaranteed API. + The value should be a CamelCase string. + This field may not be empty. + maxLength: 1024 + minLength: 1 + pattern: ^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$ + type: string + status: + description: Status of the condition + type: string + type: + description: |- + Type of condition in CamelCase or in foo.example.com/CamelCase. + Many .condition.type values are consistent across resources like Available, but because arbitrary conditions can be + useful (see .node.status.conditions), the ability to deconflict is important. + The regex it matches is (dns1123SubdomainFmt/)?(qualifiedNameFmt) + maxLength: 316 + pattern: ^([a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*/)?(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])$ + type: string + required: + - lastTransitionTime + - message + - reason + - status + - type + type: object + type: array + x-kubernetes-list-map-keys: + - type + x-kubernetes-list-type: map + type: object + required: + - metadata + - spec + type: object + served: true + storage: true + subresources: + status: {} + diff --git a/clusterloader2/pkg/dependency/kwok/dra/manifests/crds/kwok.x-k8s.io_logs.yaml b/clusterloader2/pkg/dependency/kwok/dra/manifests/crds/kwok.x-k8s.io_logs.yaml new file mode 100644 index 0000000000..0d9943ba12 --- /dev/null +++ b/clusterloader2/pkg/dependency/kwok/dra/manifests/crds/kwok.x-k8s.io_logs.yaml @@ -0,0 +1,132 @@ +--- +apiVersion: apiextensions.k8s.io/v1 +kind: CustomResourceDefinition +metadata: + annotations: + controller-gen.kubebuilder.io/version: v0.16.1 + name: logs.kwok.x-k8s.io +spec: + group: kwok.x-k8s.io + names: + kind: Logs + listKind: LogsList + plural: logs + singular: logs + scope: Namespaced + versions: + - name: v1alpha1 + schema: + openAPIV3Schema: + description: Logs provides logging configuration for a single pod. + properties: + apiVersion: + description: |- + APIVersion defines the versioned schema of this representation of an object. + Servers should convert recognized schemas to the latest internal value, and + may reject unrecognized values. + More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources + type: string + kind: + description: |- + Kind is a string value representing the REST resource this object represents. + Servers may infer this from the endpoint the client submits requests to. + Cannot be updated. + In CamelCase. + More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds + type: string + metadata: + type: object + spec: + description: Spec holds spec for logs + properties: + logs: + description: Logs is a list of logs to configure. + items: + description: Log holds information how to forward logs. + properties: + containers: + description: Containers is list of container names. + items: + type: string + type: array + follow: + description: Follow up if true + type: boolean + logsFile: + description: LogsFile is the file from which the log forward + starts + type: string + previousLogsFile: + description: PreviousLogsFile is the file containing previous + container logs + type: string + type: object + type: array + required: + - logs + type: object + status: + description: Status holds status for logs + properties: + conditions: + description: Conditions holds conditions for logs + items: + description: Condition contains details for one aspect of the current + state of this API Resource. + properties: + lastTransitionTime: + description: |- + LastTransitionTime is the last time the condition transitioned from one status to another. + This should be when the underlying condition changed. If that is not known, then using the time when the API field changed is acceptable. + format: date-time + type: string + message: + description: |- + Message is a human readable message indicating details about the transition. + This may be an empty string. + maxLength: 32768 + type: string + reason: + description: |- + Reason contains a programmatic identifier indicating the reason for the condition's last transition. + Producers of specific condition types may define expected values and meanings for this field, + and whether the values are considered a guaranteed API. + The value should be a CamelCase string. + This field may not be empty. + maxLength: 1024 + minLength: 1 + pattern: ^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$ + type: string + status: + description: Status of the condition + type: string + type: + description: |- + Type of condition in CamelCase or in foo.example.com/CamelCase. + Many .condition.type values are consistent across resources like Available, but because arbitrary conditions can be + useful (see .node.status.conditions), the ability to deconflict is important. + The regex it matches is (dns1123SubdomainFmt/)?(qualifiedNameFmt) + maxLength: 316 + pattern: ^([a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*/)?(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])$ + type: string + required: + - lastTransitionTime + - message + - reason + - status + - type + type: object + type: array + x-kubernetes-list-map-keys: + - type + x-kubernetes-list-type: map + type: object + required: + - metadata + - spec + type: object + served: true + storage: true + subresources: + status: {} + diff --git a/clusterloader2/pkg/dependency/kwok/dra/manifests/crds/kwok.x-k8s.io_metrics.yaml b/clusterloader2/pkg/dependency/kwok/dra/manifests/crds/kwok.x-k8s.io_metrics.yaml new file mode 100644 index 0000000000..07ed702dd0 --- /dev/null +++ b/clusterloader2/pkg/dependency/kwok/dra/manifests/crds/kwok.x-k8s.io_metrics.yaml @@ -0,0 +1,193 @@ +--- +apiVersion: apiextensions.k8s.io/v1 +kind: CustomResourceDefinition +metadata: + annotations: + controller-gen.kubebuilder.io/version: v0.16.1 + name: metrics.kwok.x-k8s.io +spec: + group: kwok.x-k8s.io + names: + kind: Metric + listKind: MetricList + plural: metrics + singular: metric + scope: Cluster + versions: + - name: v1alpha1 + schema: + openAPIV3Schema: + description: Metric provides metrics configuration. + properties: + apiVersion: + description: |- + APIVersion defines the versioned schema of this representation of an object. + Servers should convert recognized schemas to the latest internal value, and + may reject unrecognized values. + More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources + type: string + kind: + description: |- + Kind is a string value representing the REST resource this object represents. + Servers may infer this from the endpoint the client submits requests to. + Cannot be updated. + In CamelCase. + More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds + type: string + metadata: + type: object + spec: + description: Spec holds spec for metrics. + properties: + metrics: + description: Metrics is a list of metric configurations. + items: + description: MetricConfig provides metric configuration to a single + metric + properties: + buckets: + description: Buckets is a list of buckets for a histogram metric. + items: + description: MetricBucket is a single bucket for a metric. + properties: + hidden: + description: |- + Hidden is means that this bucket not shown in the metric. + but value will be calculated and cumulative into the next bucket. + type: boolean + le: + description: Le is less-than or equal. + minimum: 0 + type: number + value: + description: Value is a CEL expression. + type: string + required: + - le + - value + type: object + type: array + x-kubernetes-list-map-keys: + - le + x-kubernetes-list-type: map + dimension: + default: node + description: Dimension is a dimension of the metric. + type: string + help: + description: Help provides information about this metric. + type: string + kind: + description: Kind is kind of metric + enum: + - counter + - gauge + - histogram + type: string + labels: + description: Labels are metric labels. + items: + description: MetricLabel holds label name and the value of + the label. + properties: + name: + description: Name is a label name. + minLength: 1 + type: string + value: + description: Value is a CEL expression. + minLength: 1 + type: string + required: + - name + - value + type: object + type: array + x-kubernetes-list-map-keys: + - name + x-kubernetes-list-type: map + name: + description: Name is the fully-qualified name of the metric. + minLength: 1 + type: string + value: + description: Value is a CEL expression. + type: string + required: + - kind + - name + type: object + type: array + path: + description: Path is a restful service path. + minLength: 1 + type: string + required: + - metrics + - path + type: object + status: + description: Status holds status for metrics + properties: + conditions: + description: Conditions holds conditions for metrics. + items: + description: Condition contains details for one aspect of the current + state of this API Resource. + properties: + lastTransitionTime: + description: |- + LastTransitionTime is the last time the condition transitioned from one status to another. + This should be when the underlying condition changed. If that is not known, then using the time when the API field changed is acceptable. + format: date-time + type: string + message: + description: |- + Message is a human readable message indicating details about the transition. + This may be an empty string. + maxLength: 32768 + type: string + reason: + description: |- + Reason contains a programmatic identifier indicating the reason for the condition's last transition. + Producers of specific condition types may define expected values and meanings for this field, + and whether the values are considered a guaranteed API. + The value should be a CamelCase string. + This field may not be empty. + maxLength: 1024 + minLength: 1 + pattern: ^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$ + type: string + status: + description: Status of the condition + type: string + type: + description: |- + Type of condition in CamelCase or in foo.example.com/CamelCase. + Many .condition.type values are consistent across resources like Available, but because arbitrary conditions can be + useful (see .node.status.conditions), the ability to deconflict is important. + The regex it matches is (dns1123SubdomainFmt/)?(qualifiedNameFmt) + maxLength: 316 + pattern: ^([a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*/)?(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])$ + type: string + required: + - lastTransitionTime + - message + - reason + - status + - type + type: object + type: array + x-kubernetes-list-map-keys: + - type + x-kubernetes-list-type: map + type: object + required: + - metadata + - spec + type: object + served: true + storage: true + subresources: + status: {} + diff --git a/clusterloader2/pkg/dependency/kwok/dra/manifests/crds/kwok.x-k8s.io_portforwards.yaml b/clusterloader2/pkg/dependency/kwok/dra/manifests/crds/kwok.x-k8s.io_portforwards.yaml new file mode 100644 index 0000000000..f8e389fcb4 --- /dev/null +++ b/clusterloader2/pkg/dependency/kwok/dra/manifests/crds/kwok.x-k8s.io_portforwards.yaml @@ -0,0 +1,149 @@ +--- +apiVersion: apiextensions.k8s.io/v1 +kind: CustomResourceDefinition +metadata: + annotations: + controller-gen.kubebuilder.io/version: v0.16.1 + name: portforwards.kwok.x-k8s.io +spec: + group: kwok.x-k8s.io + names: + kind: PortForward + listKind: PortForwardList + plural: portforwards + singular: portforward + scope: Namespaced + versions: + - name: v1alpha1 + schema: + openAPIV3Schema: + description: PortForward provides port forward configuration for a single + pod. + properties: + apiVersion: + description: |- + APIVersion defines the versioned schema of this representation of an object. + Servers should convert recognized schemas to the latest internal value, and + may reject unrecognized values. + More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources + type: string + kind: + description: |- + Kind is a string value representing the REST resource this object represents. + Servers may infer this from the endpoint the client submits requests to. + Cannot be updated. + In CamelCase. + More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds + type: string + metadata: + type: object + spec: + description: Spec holds spec for port forward. + properties: + forwards: + description: Forwards is a list of forwards to configure. + items: + description: Forward holds information how to forward based on ports. + properties: + command: + description: |- + Command is the command to run to forward with stdin/stdout. + if set, Target will be ignored. + items: + type: string + type: array + ports: + description: |- + Ports is a list of ports to forward. + if not set, all ports will be forwarded. + items: + format: int32 + type: integer + type: array + target: + description: Target is the target to forward to. + properties: + address: + description: Address is the address to forward to. + minLength: 1 + type: string + port: + description: Port is the port to forward to. + format: int32 + maximum: 65535 + minimum: 0 + type: integer + required: + - address + - port + type: object + type: object + type: array + required: + - forwards + type: object + status: + description: Status holds status for port forward + properties: + conditions: + description: Conditions holds conditions for port forward + items: + description: Condition contains details for one aspect of the current + state of this API Resource. + properties: + lastTransitionTime: + description: |- + LastTransitionTime is the last time the condition transitioned from one status to another. + This should be when the underlying condition changed. If that is not known, then using the time when the API field changed is acceptable. + format: date-time + type: string + message: + description: |- + Message is a human readable message indicating details about the transition. + This may be an empty string. + maxLength: 32768 + type: string + reason: + description: |- + Reason contains a programmatic identifier indicating the reason for the condition's last transition. + Producers of specific condition types may define expected values and meanings for this field, + and whether the values are considered a guaranteed API. + The value should be a CamelCase string. + This field may not be empty. + maxLength: 1024 + minLength: 1 + pattern: ^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$ + type: string + status: + description: Status of the condition + type: string + type: + description: |- + Type of condition in CamelCase or in foo.example.com/CamelCase. + Many .condition.type values are consistent across resources like Available, but because arbitrary conditions can be + useful (see .node.status.conditions), the ability to deconflict is important. + The regex it matches is (dns1123SubdomainFmt/)?(qualifiedNameFmt) + maxLength: 316 + pattern: ^([a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*/)?(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])$ + type: string + required: + - lastTransitionTime + - message + - reason + - status + - type + type: object + type: array + x-kubernetes-list-map-keys: + - type + x-kubernetes-list-type: map + type: object + required: + - metadata + - spec + type: object + served: true + storage: true + subresources: + status: {} + diff --git a/clusterloader2/pkg/dependency/kwok/dra/manifests/crds/kwok.x-k8s.io_resourceusages.yaml b/clusterloader2/pkg/dependency/kwok/dra/manifests/crds/kwok.x-k8s.io_resourceusages.yaml new file mode 100644 index 0000000000..f51b7aba3c --- /dev/null +++ b/clusterloader2/pkg/dependency/kwok/dra/manifests/crds/kwok.x-k8s.io_resourceusages.yaml @@ -0,0 +1,138 @@ +--- +apiVersion: apiextensions.k8s.io/v1 +kind: CustomResourceDefinition +metadata: + annotations: + controller-gen.kubebuilder.io/version: v0.16.1 + name: resourceusages.kwok.x-k8s.io +spec: + group: kwok.x-k8s.io + names: + kind: ResourceUsage + listKind: ResourceUsageList + plural: resourceusages + singular: resourceusage + scope: Namespaced + versions: + - name: v1alpha1 + schema: + openAPIV3Schema: + description: ResourceUsage provides resource usage for a single pod. + properties: + apiVersion: + description: |- + APIVersion defines the versioned schema of this representation of an object. + Servers should convert recognized schemas to the latest internal value, and + may reject unrecognized values. + More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources + type: string + kind: + description: |- + Kind is a string value representing the REST resource this object represents. + Servers may infer this from the endpoint the client submits requests to. + Cannot be updated. + In CamelCase. + More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds + type: string + metadata: + type: object + spec: + description: Spec holds spec for resource usage. + properties: + usages: + description: Usages is a list of resource usage for the pod. + items: + description: ResourceUsageContainer holds spec for resource usage + container. + properties: + containers: + description: Containers is list of container names. + items: + type: string + type: array + usage: + additionalProperties: + description: ResourceUsageValue holds value for resource usage. + properties: + expression: + description: Expression is the expression for resource + usage. + type: string + value: + anyOf: + - type: integer + - type: string + description: Value is the value for resource usage. + pattern: ^(\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))(([KMGTPE]i)|[numkMGTPE]|([eE](\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))))?$ + x-kubernetes-int-or-string: true + type: object + description: Usage is a list of resource usage for the container. + type: object + type: object + type: array + type: object + status: + description: Status holds status for resource usage + properties: + conditions: + description: Conditions holds conditions for resource usage + items: + description: Condition contains details for one aspect of the current + state of this API Resource. + properties: + lastTransitionTime: + description: |- + LastTransitionTime is the last time the condition transitioned from one status to another. + This should be when the underlying condition changed. If that is not known, then using the time when the API field changed is acceptable. + format: date-time + type: string + message: + description: |- + Message is a human readable message indicating details about the transition. + This may be an empty string. + maxLength: 32768 + type: string + reason: + description: |- + Reason contains a programmatic identifier indicating the reason for the condition's last transition. + Producers of specific condition types may define expected values and meanings for this field, + and whether the values are considered a guaranteed API. + The value should be a CamelCase string. + This field may not be empty. + maxLength: 1024 + minLength: 1 + pattern: ^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$ + type: string + status: + description: Status of the condition + type: string + type: + description: |- + Type of condition in CamelCase or in foo.example.com/CamelCase. + Many .condition.type values are consistent across resources like Available, but because arbitrary conditions can be + useful (see .node.status.conditions), the ability to deconflict is important. + The regex it matches is (dns1123SubdomainFmt/)?(qualifiedNameFmt) + maxLength: 316 + pattern: ^([a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*/)?(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])$ + type: string + required: + - lastTransitionTime + - message + - reason + - status + - type + type: object + type: array + x-kubernetes-list-map-keys: + - type + x-kubernetes-list-type: map + type: object + required: + - metadata + - spec + type: object + served: true + storage: true + subresources: + status: {} + diff --git a/clusterloader2/pkg/dependency/kwok/dra/manifests/crds/kwok.x-k8s.io_stages.yaml b/clusterloader2/pkg/dependency/kwok/dra/manifests/crds/kwok.x-k8s.io_stages.yaml new file mode 100644 index 0000000000..46b90c4755 --- /dev/null +++ b/clusterloader2/pkg/dependency/kwok/dra/manifests/crds/kwok.x-k8s.io_stages.yaml @@ -0,0 +1,438 @@ +--- +apiVersion: apiextensions.k8s.io/v1 +kind: CustomResourceDefinition +metadata: + annotations: + controller-gen.kubebuilder.io/version: v0.16.1 + name: stages.kwok.x-k8s.io +spec: + group: kwok.x-k8s.io + names: + kind: Stage + listKind: StageList + plural: stages + singular: stage + scope: Cluster + versions: + - name: v1alpha1 + schema: + openAPIV3Schema: + description: Stage is an API that describes the staged change of a resource + properties: + apiVersion: + description: |- + APIVersion defines the versioned schema of this representation of an object. + Servers should convert recognized schemas to the latest internal value, and + may reject unrecognized values. + More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources + type: string + kind: + description: |- + Kind is a string value representing the REST resource this object represents. + Servers may infer this from the endpoint the client submits requests to. + Cannot be updated. + In CamelCase. + More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds + type: string + metadata: + type: object + spec: + description: Spec holds information about the request being evaluated. + properties: + delay: + description: Delay means there is a delay in this stage. + properties: + durationFrom: + description: |- + DurationFrom is the expression used to get the value. + If it is a time.Time type, getting the value will be minus time.Now() to get DurationMilliseconds + If it is a string type, the value get will be parsed by time.ParseDuration. + properties: + cel: + description: CEL is a Common Expression Language based expression + for value extraction + properties: + expression: + description: Expression represents the expression which + will be evaluated by CEL. + type: string + type: object + expressionFrom: + description: |- + ExpressionFrom is the expression used to get the value. + Deprecated: Use JQ instead. + type: string + jq: + description: JQ is a JSON Query based expression for value + extraction + properties: + expression: + description: Expression represents the expression which + will be evaluated by JQ. + type: string + type: object + type: object + durationMilliseconds: + description: |- + DurationMilliseconds indicates the stage delay time. + If JitterDurationMilliseconds is less than DurationMilliseconds, then JitterDurationMilliseconds is used. + format: int64 + minimum: 0 + type: integer + jitterDurationFrom: + description: |- + JitterDurationFrom is the expression used to get the value. + If it is a time.Time type, getting the value will be minus time.Now() to get JitterDurationMilliseconds + If it is a string type, the value get will be parsed by time.ParseDuration. + properties: + cel: + description: CEL is a Common Expression Language based expression + for value extraction + properties: + expression: + description: Expression represents the expression which + will be evaluated by CEL. + type: string + type: object + expressionFrom: + description: |- + ExpressionFrom is the expression used to get the value. + Deprecated: Use JQ instead. + type: string + jq: + description: JQ is a JSON Query based expression for value + extraction + properties: + expression: + description: Expression represents the expression which + will be evaluated by JQ. + type: string + type: object + type: object + jitterDurationMilliseconds: + description: |- + JitterDurationMilliseconds is the duration plus an additional amount chosen uniformly + at random from the interval between DurationMilliseconds and JitterDurationMilliseconds. + format: int64 + minimum: 0 + type: integer + type: object + immediateNextStage: + description: ImmediateNextStage means that the next stage of matching + is performed immediately, without waiting for the Apiserver to push. + type: boolean + next: + description: Next indicates that this stage will be moved to. + properties: + delete: + description: Delete means that the resource will be deleted if + true. + type: boolean + event: + description: Event means that an event will be sent. + properties: + message: + description: Message is a human-readable description of the + status of this operation. + type: string + reason: + description: Reason is why the action was taken. It is human-readable. + type: string + type: + description: Type is the type of this event (Normal, Warning), + It is machine-readable. + type: string + type: object + finalizers: + description: Finalizers means that finalizers will be modified. + properties: + add: + description: Add means that the Finalizers will be added to + the resource. + items: + description: FinalizerItem describes the one of the finalizers. + properties: + value: + description: Value is the value of the finalizer. + type: string + type: object + type: array + empty: + description: Empty means that the Finalizers for that resource + will be emptied. + type: boolean + remove: + description: Remove means that the Finalizers will be removed + from the resource. + items: + description: FinalizerItem describes the one of the finalizers. + properties: + value: + description: Value is the value of the finalizer. + type: string + type: object + type: array + type: object + patches: + description: Patches means that the resource will be patched. + items: + description: StagePatch describes the patch for the resource. + properties: + impersonation: + description: |- + Impersonation indicates the impersonating configuration for client when patching status. + In most cases this will be empty, in which case the default client service account will be used. + When this is not empty, a corresponding rbac change is required to grant `impersonate` privilege. + The support for this field is not available in Pod and Node resources. + properties: + username: + description: Username the target username for the client + to impersonate + type: string + required: + - username + type: object + root: + description: Root indicates the root of the template calculated + by the patch. + type: string + subresource: + description: Subresource indicates the name of the subresource + that will be patched. + type: string + template: + description: Template indicates the template for modifying + the resource in the next. + type: string + type: + description: Type indicates the type of the patch. + enum: + - json + - merge + - strategic + type: string + type: object + type: array + statusPatchAs: + description: |- + StatusPatchAs indicates the impersonating configuration for client when patching status. + In most cases this will be empty, in which case the default client service account will be used. + When this is not empty, a corresponding rbac change is required to grant `impersonate` privilege. + The support for this field is not available in Pod and Node resources. + Deprecated: Use Patches instead. + properties: + username: + description: Username the target username for the client to + impersonate + type: string + required: + - username + type: object + statusSubresource: + default: status + description: |- + StatusSubresource indicates the name of the subresource that will be patched. The support for + this field is not available in Pod and Node resources. + Deprecated: Use Patches instead. + type: string + statusTemplate: + description: |- + StatusTemplate indicates the template for modifying the status of the resource in the next. + Deprecated: Use Patches instead. + type: string + type: object + resourceRef: + description: ResourceRef specifies the Kind and version of the resource. + properties: + apiGroup: + default: v1 + description: APIGroup of the referent. + type: string + kind: + description: Kind of the referent. + type: string + required: + - kind + type: object + selector: + description: Selector specifies the stags will be applied to the selected + resource. + properties: + matchAnnotations: + additionalProperties: + type: string + description: |- + MatchAnnotations is a map of {key,value} pairs. A single {key,value} in the matchAnnotations + map is equivalent to an element of matchExpressions, whose key field is ".metadata.annotations[key]", the + operator is "In", and the values array contains only "value". The requirements are ANDed. + type: object + matchExpressions: + description: MatchExpressions is a list of label selector expressions. + The requirements are ANDed. + items: + description: MatchExpression is a resource selector expression + that must evaluate to true for a resource to be matched. + properties: + cel: + description: CEL is a Common Expression Language based selector + expression + properties: + expression: + description: Expression represents the expression which + will be evaluated by CEL. + type: string + type: object + jq: + description: JQ is a JSON Query based selector expression + properties: + key: + description: Key represents the expression which will + be evaluated by JQ. + type: string + operator: + description: Represents a scope's relationship to a + set of values. + type: string + values: + description: |- + An array of string values. + If the operator is In, NotIn, Intersection or NotIntersection, the values array must be non-empty. + If the operator is Exists or DoesNotExist, the values array must be empty. + items: + type: string + type: array + type: object + key: + description: |- + Key represents the expression which will be evaluated by JQ. + Deprecated: Use JQ instead. + type: string + operator: + description: |- + Represents a scope's relationship to a set of values. + Deprecated: Use JQ instead. + type: string + values: + description: |- + An array of string values. + If the operator is In, NotIn, Intersection or NotIntersection, the values array must be non-empty. + If the operator is Exists or DoesNotExist, the values array must be empty. + Deprecated: Use JQ instead. + items: + type: string + type: array + type: object + type: array + matchLabels: + additionalProperties: + type: string + description: |- + MatchLabels is a map of {key,value} pairs. A single {key,value} in the matchLabels + map is equivalent to an element of matchExpressions, whose key field is ".metadata.labels[key]", the + operator is "In", and the values array contains only "value". The requirements are ANDed. + type: object + type: object + weight: + default: 0 + description: |- + Weight means when multiple stages share the same ResourceRef and Selector, + a random stage will be matched as the next stage based on the weight. + minimum: 0 + type: integer + weightFrom: + description: |- + WeightFrom means is the expression used to get the value. + If it is a number type, convert to int. + If it is a string type, the value get will be parsed by strconv.ParseInt. + properties: + cel: + description: CEL is a Common Expression Language based expression + for value extraction + properties: + expression: + description: Expression represents the expression which will + be evaluated by CEL. + type: string + type: object + expressionFrom: + description: |- + ExpressionFrom is the expression used to get the value. + Deprecated: Use JQ instead. + type: string + jq: + description: JQ is a JSON Query based expression for value extraction + properties: + expression: + description: Expression represents the expression which will + be evaluated by JQ. + type: string + type: object + type: object + required: + - next + - resourceRef + type: object + status: + description: Status holds status for the Stage + properties: + conditions: + description: Conditions holds conditions for the Stage. + items: + description: Condition contains details for one aspect of the current + state of this API Resource. + properties: + lastTransitionTime: + description: |- + LastTransitionTime is the last time the condition transitioned from one status to another. + This should be when the underlying condition changed. If that is not known, then using the time when the API field changed is acceptable. + format: date-time + type: string + message: + description: |- + Message is a human readable message indicating details about the transition. + This may be an empty string. + maxLength: 32768 + type: string + reason: + description: |- + Reason contains a programmatic identifier indicating the reason for the condition's last transition. + Producers of specific condition types may define expected values and meanings for this field, + and whether the values are considered a guaranteed API. + The value should be a CamelCase string. + This field may not be empty. + maxLength: 1024 + minLength: 1 + pattern: ^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$ + type: string + status: + description: Status of the condition + type: string + type: + description: |- + Type of condition in CamelCase or in foo.example.com/CamelCase. + Many .condition.type values are consistent across resources like Available, but because arbitrary conditions can be + useful (see .node.status.conditions), the ability to deconflict is important. + The regex it matches is (dns1123SubdomainFmt/)?(qualifiedNameFmt) + maxLength: 316 + pattern: ^([a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*/)?(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])$ + type: string + required: + - lastTransitionTime + - message + - reason + - status + - type + type: object + type: array + x-kubernetes-list-map-keys: + - type + x-kubernetes-list-type: map + type: object + required: + - spec + type: object + served: true + storage: true + subresources: + status: {} + diff --git a/clusterloader2/pkg/dependency/kwok/dra/manifests/deployment.yaml b/clusterloader2/pkg/dependency/kwok/dra/manifests/deployment.yaml new file mode 100644 index 0000000000..2545bac953 --- /dev/null +++ b/clusterloader2/pkg/dependency/kwok/dra/manifests/deployment.yaml @@ -0,0 +1,92 @@ +--- +# Source: kwok/templates/deployment.yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: kwok-controller + namespace: kwok-system + labels: + helm.sh/chart: kwok-0.2.0 + app.kubernetes.io/name: kwok + app.kubernetes.io/instance: kwok-controller + app.kubernetes.io/version: "v0.7.0" + app.kubernetes.io/managed-by: Helm +spec: + replicas: 1 + selector: + matchLabels: + app.kubernetes.io/name: kwok + app.kubernetes.io/instance: kwok-controller + template: + metadata: + labels: + app.kubernetes.io/name: kwok + app.kubernetes.io/instance: kwok-controller + spec: + serviceAccountName: kwok-controller + restartPolicy: Always + containers: + - name: kwok + image: "registry.k8s.io/kwok/kwok:v0.7.0" + args: + - --config=/root/.kwok/kwok.yaml + - --node-ip=$(POD_IP) + env: + + - name: POD_IP + valueFrom: + fieldRef: + fieldPath: status.podIP + - name: HOST_IP + valueFrom: + fieldRef: + fieldPath: status.hostIP + imagePullPolicy: IfNotPresent + securityContext: + {} + livenessProbe: + failureThreshold: 10 + httpGet: + path: /healthz + port: 10247 + scheme: HTTP + initialDelaySeconds: 30 + periodSeconds: 60 + timeoutSeconds: 10 + readinessProbe: + failureThreshold: 5 + httpGet: + path: /healthz + port: 10247 + scheme: HTTP + initialDelaySeconds: 2 + periodSeconds: 20 + timeoutSeconds: 2 + startupProbe: + failureThreshold: 3 + httpGet: + path: /healthz + port: 10247 + scheme: HTTP + initialDelaySeconds: 2 + periodSeconds: 10 + timeoutSeconds: 2 + volumeMounts: + - name: kwok-config + subPath: kwok.yaml + mountPath: /root/.kwok/kwok.yaml + readOnly: true + resources: + {} + tolerations: + - effect: NoSchedule + key: node-role.kubernetes.io/control-plane + operator: Exists + - effect: NoSchedule + key: node-role.kubernetes.io/master + operator: Exists + hostNetwork: false + volumes: + - name: kwok-config + configMap: + name: kwok-controller diff --git a/clusterloader2/pkg/dependency/kwok/dra/manifests/device-class.yaml b/clusterloader2/pkg/dependency/kwok/dra/manifests/device-class.yaml new file mode 100644 index 0000000000..1751b397f0 --- /dev/null +++ b/clusterloader2/pkg/dependency/kwok/dra/manifests/device-class.yaml @@ -0,0 +1,9 @@ +--- +apiVersion: resource.k8s.io/v1beta2 +kind: DeviceClass +metadata: + name: cl2-gpu.kwok.x-k8s.io +spec: + selectors: + - cel: + expression: "device.driver == 'cl2-gpu.kwok.x-k8s.io'" \ No newline at end of file diff --git a/clusterloader2/pkg/dependency/kwok/dra/manifests/flow_schema.yaml b/clusterloader2/pkg/dependency/kwok/dra/manifests/flow_schema.yaml new file mode 100644 index 0000000000..87989dd243 --- /dev/null +++ b/clusterloader2/pkg/dependency/kwok/dra/manifests/flow_schema.yaml @@ -0,0 +1,37 @@ +--- +# Source: kwok/templates/flow_schema.yaml +apiVersion: flowcontrol.apiserver.k8s.io/v1 +kind: FlowSchema +metadata: + name: kwok-controller + labels: + helm.sh/chart: kwok-0.2.0 + app.kubernetes.io/name: kwok + app.kubernetes.io/instance: kwok-controller + app.kubernetes.io/version: "v0.7.0" + app.kubernetes.io/managed-by: Helm +spec: + priorityLevelConfiguration: + name: exempt + matchingPrecedence: 1000 + rules: + - nonResourceRules: + - nonResourceURLs: + - '*' + verbs: + - '*' + resourceRules: + - apiGroups: + - '*' + clusterScope: true + namespaces: + - '*' + resources: + - '*' + verbs: + - '*' + subjects: + - kind: ServiceAccount + serviceAccount: + name: kwok-controller + namespace: kwok-system diff --git a/clusterloader2/pkg/dependency/kwok/dra/manifests/job-completion-stages.yaml b/clusterloader2/pkg/dependency/kwok/dra/manifests/job-completion-stages.yaml new file mode 100644 index 0000000000..ac0e9086dd --- /dev/null +++ b/clusterloader2/pkg/dependency/kwok/dra/manifests/job-completion-stages.yaml @@ -0,0 +1,84 @@ +--- +apiVersion: kwok.x-k8s.io/v1alpha1 +kind: Stage +metadata: + name: job-complete-short +spec: + resourceRef: + apiGroup: v1 + kind: Pod + selector: + matchExpressions: + - key: '.metadata.ownerReferences.[].kind' + operator: 'In' + values: ['Job'] + - key: '.metadata.labels.job-type' + operator: 'In' + values: ['short-lived'] + - key: '.status.phase' + operator: 'In' + values: ['Running'] + delay: + durationMilliseconds: {{.CL2_JOB_RUNNING_TIME_MS | DefaultParam 30000}} + next: + statusTemplate: | + {{print "{{ $now := Now }}"}} + {{print "{{ $root := . }}"}} + containerStatuses: + {{print "{{ range $index, $item := .spec.containers }}"}} + - image: {{print "{{ $item.image | Quote }}"}} + name: {{print "{{ $item.name | Quote }}"}} + ready: false + restartCount: 0 + started: false + state: + terminated: + exitCode: 0 + finishedAt: {{print "{{ $now | Quote }}"}} + reason: Completed + startedAt: {{print "{{ $now | Quote }}"}} + {{print "{{ end }}"}} + phase: Succeeded + immediateNextStage: true +--- +apiVersion: kwok.x-k8s.io/v1alpha1 +kind: Stage +metadata: + name: job-complete-long +spec: + resourceRef: + apiGroup: v1 + kind: Pod + selector: + matchExpressions: + - key: '.metadata.ownerReferences.[].kind' + operator: 'In' + values: ['Job'] + - key: '.metadata.labels.job-type' + operator: 'In' + values: ['long-running'] + - key: '.status.phase' + operator: 'In' + values: ['Running'] + delay: + durationMilliseconds: {{.CL2_LONG_JOB_RUNNING_TIME_MS | DefaultParam 3600000}} + next: + statusTemplate: | + {{print "{{ $now := Now }}"}} + {{print "{{ $root := . }}"}} + containerStatuses: + {{print "{{ range $index, $item := .spec.containers }}"}} + - image: {{print "{{ $item.image | Quote }}"}} + name: {{print "{{ $item.name | Quote }}"}} + ready: false + restartCount: 0 + started: false + state: + terminated: + exitCode: 0 + finishedAt: {{print "{{ $now | Quote }}"}} + reason: Completed + startedAt: {{print "{{ $now | Quote }}"}} + {{print "{{ end }}"}} + phase: Succeeded + immediateNextStage: true \ No newline at end of file diff --git a/clusterloader2/pkg/dependency/kwok/dra/manifests/kwok.yaml b/clusterloader2/pkg/dependency/kwok/dra/manifests/kwok.yaml new file mode 100644 index 0000000000..36edeba79b --- /dev/null +++ b/clusterloader2/pkg/dependency/kwok/dra/manifests/kwok.yaml @@ -0,0 +1,45 @@ +--- +# Source: kwok/templates/kwok.yaml +apiVersion: v1 +kind: ConfigMap +metadata: + name: kwok-controller + namespace: kwok-system + labels: + helm.sh/chart: kwok-0.2.0 + app.kubernetes.io/name: kwok + app.kubernetes.io/instance: kwok-controller + app.kubernetes.io/version: "v0.7.0" + app.kubernetes.io/managed-by: Helm +data: + kwok.yaml: |- + apiVersion: config.kwok.x-k8s.io/v1alpha1 + kind: KwokConfiguration + options: + enableProfilingHandler: false + enableContentionProfiling: false + enablePodsOnNodeSyncListPager: false + enablePodsOnNodeSyncStreamWatch: true + nodeLeaseParallelism: 4 + podPlayStageParallelism: 4 + nodePlayStageParallelism: 4 + nodePort: 10247 + cidr: 10.0.0.1/24 + manageAllNodes: false + manageNodesWithAnnotationSelector: 'kwok.x-k8s.io/node=fake' + manageNodesWithLabelSelector: '' + manageSingleNode: '' + nodeLeaseDurationSeconds: 40 + enableCRDs: + - Stage + - Metric + - Attach + - ClusterAttach + - Exec + - ClusterExec + - Logs + - ClusterLogs + - PortForward + - ClusterPortForward + - ResourceUsage + - ClusterResourceUsage diff --git a/clusterloader2/pkg/dependency/kwok/dra/manifests/role.yaml b/clusterloader2/pkg/dependency/kwok/dra/manifests/role.yaml new file mode 100644 index 0000000000..e14f5419f7 --- /dev/null +++ b/clusterloader2/pkg/dependency/kwok/dra/manifests/role.yaml @@ -0,0 +1,104 @@ +--- +# Source: kwok/templates/role.yaml +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRole +metadata: + name: kwok-controller + labels: + helm.sh/chart: kwok-0.2.0 + app.kubernetes.io/name: kwok + app.kubernetes.io/instance: kwok-controller + app.kubernetes.io/version: "v0.7.0" + app.kubernetes.io/managed-by: Helm +rules: +- apiGroups: + - "" + resources: + - events + verbs: + - create + - delete + - get + - list + - patch + - update + - watch +- apiGroups: + - "" + resources: + - nodes + verbs: + - get + - list + - watch +- apiGroups: + - "" + resources: + - nodes/status + - pods/status + verbs: + - patch + - update +- apiGroups: + - "" + resources: + - pods + verbs: + - delete + - get + - list + - patch + - update + - watch +- apiGroups: + - coordination.k8s.io + resources: + - leases + verbs: + - create + - get + - list + - patch + - update + - watch +- apiGroups: + - kwok.x-k8s.io + resources: + - attaches + - clusterattaches + - clusterexecs + - clusterlogs + - clusterportforwards + - clusterresourceusages + - execs + - logs + - metrics + - portforwards + - resourceusages + - stages + verbs: + - create + - delete + - get + - list + - patch + - update + - watch +- apiGroups: + - kwok.x-k8s.io + resources: + - attaches/status + - clusterattaches/status + - clusterexecs/status + - clusterlogs/status + - clusterportforwards/status + - clusterresourceusages/status + - execs/status + - logs/status + - metrics/status + - portforwards/status + - resourceusages/status + - stages/status + verbs: + - patch + - update diff --git a/clusterloader2/pkg/dependency/kwok/dra/manifests/role_binding.yaml b/clusterloader2/pkg/dependency/kwok/dra/manifests/role_binding.yaml new file mode 100644 index 0000000000..deb6b34eca --- /dev/null +++ b/clusterloader2/pkg/dependency/kwok/dra/manifests/role_binding.yaml @@ -0,0 +1,20 @@ +--- +# Source: kwok/templates/role_binding.yaml +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRoleBinding +metadata: + name: kwok-controller + labels: + helm.sh/chart: kwok-0.2.0 + app.kubernetes.io/name: kwok + app.kubernetes.io/instance: kwok-controller + app.kubernetes.io/version: "v0.7.0" + app.kubernetes.io/managed-by: Helm +roleRef: + apiGroup: rbac.authorization.k8s.io + kind: ClusterRole + name: kwok-controller +subjects: +- kind: ServiceAccount + name: kwok-controller + namespace: kwok-system diff --git a/clusterloader2/pkg/dependency/kwok/dra/manifests/service.yaml b/clusterloader2/pkg/dependency/kwok/dra/manifests/service.yaml new file mode 100644 index 0000000000..6d703d7370 --- /dev/null +++ b/clusterloader2/pkg/dependency/kwok/dra/manifests/service.yaml @@ -0,0 +1,23 @@ +--- +# Source: kwok/templates/service.yaml +apiVersion: v1 +kind: Service +metadata: + name: kwok-controller + namespace: kwok-system + labels: + helm.sh/chart: kwok-0.2.0 + app.kubernetes.io/name: kwok + app.kubernetes.io/instance: kwok-controller + app.kubernetes.io/version: "v0.7.0" + app.kubernetes.io/managed-by: Helm +spec: + ports: + - name: http + port: 10247 + protocol: TCP + targetPort: 10247 + selector: + app.kubernetes.io/name: kwok + app.kubernetes.io/instance: kwok-controller + type: ClusterIP diff --git a/clusterloader2/pkg/dependency/kwok/dra/manifests/service_account.yaml b/clusterloader2/pkg/dependency/kwok/dra/manifests/service_account.yaml new file mode 100644 index 0000000000..17ccb1d636 --- /dev/null +++ b/clusterloader2/pkg/dependency/kwok/dra/manifests/service_account.yaml @@ -0,0 +1,13 @@ +--- +# Source: kwok/templates/service_account.yaml +apiVersion: v1 +kind: ServiceAccount +metadata: + name: kwok-controller + labels: + helm.sh/chart: kwok-0.2.0 + app.kubernetes.io/name: kwok + app.kubernetes.io/instance: kwok-controller + app.kubernetes.io/version: "v0.7.0" + app.kubernetes.io/managed-by: Helm + namespace: kwok-system diff --git a/clusterloader2/pkg/dependency/kwok/dra/manifests/stage_fast_node.yaml b/clusterloader2/pkg/dependency/kwok/dra/manifests/stage_fast_node.yaml new file mode 100644 index 0000000000..2e4853086d --- /dev/null +++ b/clusterloader2/pkg/dependency/kwok/dra/manifests/stage_fast_node.yaml @@ -0,0 +1,54 @@ +apiVersion: kwok.x-k8s.io/v1alpha1 +kind: Stage +metadata: + name: node-heartbeat-with-lease +spec: + delay: + durationMilliseconds: 600000 + jitterDurationMilliseconds: 610000 + next: + statusTemplate: | + {{print "{{ $now := Now }}"}} + {{print "{{ $lastTransitionTime := or .metadata.creationTimestamp $now }}"}} + conditions: + {{print "{{ range NodeConditions }}"}} + - lastHeartbeatTime: {{print "{{ $now | Quote }}"}} + lastTransitionTime: {{print "{{ $lastTransitionTime | Quote }}"}} + message: {{print "{{ .message | Quote }}"}} + reason: {{print "{{ .reason | Quote }}"}} + status: {{print "{{ .status | Quote }}"}} + type: {{print "{{ .type | Quote }}"}} + {{print "{{ end }}"}} + + addresses: + {{print "{{ with .status.addresses }}"}} + {{print "{{ YAML . 1 }}"}} + {{print "{{ else }}"}} + {{print "{{ with NodeIP }}"}} + - address: {{print "{{ . | Quote }}"}} + type: InternalIP + {{print "{{ end }}"}} + {{print "{{ with NodeName }}"}} + - address: {{print "{{ . | Quote }}"}} + type: Hostname + {{print "{{ end }}"}} + {{print "{{ end }}"}} + + {{print "{{ with NodePort }}"}} + daemonEndpoints: + kubeletEndpoint: + Port: {{print "{{ . }}"}} + {{print "{{ end }}"}} + resourceRef: + apiGroup: v1 + kind: Node + selector: + matchExpressions: + - key: .status.phase + operator: In + values: + - Running + - key: .status.conditions.[] | select( .type == "Ready" ) | .status + operator: In + values: + - "True" \ No newline at end of file diff --git a/clusterloader2/pkg/dependency/kwok/dra/manifests/stage_fast_node_initialize.yaml b/clusterloader2/pkg/dependency/kwok/dra/manifests/stage_fast_node_initialize.yaml new file mode 100644 index 0000000000..3237a86152 --- /dev/null +++ b/clusterloader2/pkg/dependency/kwok/dra/manifests/stage_fast_node_initialize.yaml @@ -0,0 +1,80 @@ +--- +apiVersion: kwok.x-k8s.io/v1alpha1 +kind: Stage +metadata: + name: node-initialize +spec: + next: + statusTemplate: | + {{print "{{ $now := Now }}"}} + {{print "{{ $lastTransitionTime := or .metadata.creationTimestamp $now }}"}} + conditions: + {{print "{{ range NodeConditions }}"}} + - lastHeartbeatTime: {{print "{{ $now | Quote }}"}} + lastTransitionTime: {{print "{{ $lastTransitionTime | Quote }}"}} + message: {{print "{{ .message | Quote }}"}} + reason: {{print "{{ .reason | Quote }}"}} + status: {{print "{{ .status | Quote }}"}} + type: {{print "{{ .type | Quote}}"}} + {{print "{{ end }}"}} + + addresses: + {{print "{{ with .status.addresses }}"}} + {{print "{{ YAML . 1 }}"}} + {{print "{{ else }}"}} + {{print "{{ with NodeIP }}"}} + - address: {{print "{{ . | Quote }}"}} + type: InternalIP + {{print "{{ end }}"}} + {{print "{{ with NodeName }}"}} + - address: {{print "{{ . | Quote }}"}} + type: Hostname + {{print "{{ end }}"}} + {{print "{{ end }}"}} + + {{print "{{ with NodePort }}"}} + daemonEndpoints: + kubeletEndpoint: + Port: {{print "{{ . }}"}} + {{print "{{ end }}"}} + + allocatable: + {{print "{{ with .status.allocatable }}"}} + {{print "{{ YAML . 1 }}"}} + {{print "{{ else }}"}} + cpu: 1k + memory: 1Ti + pods: 1M + {{print "{{ end }}"}} + capacity: + {{print "{{ with .status.capacity }}"}} + {{print "{{ YAML . 1 }}"}} + {{print "{{ else }}"}} + cpu: 1k + memory: 1Ti + pods: 1M + {{print "{{ end }}"}} + + {{print "{{ $nodeInfo := .status.nodeInfo }}"}} + {{print "{{ $kwokVersion := printf \"kwok-%s\" Version }}"}} + nodeInfo: + architecture: {{print "{{ or $nodeInfo.architecture \"amd64\" }}"}} + bootID: {{print "{{ or $nodeInfo.bootID \"\" }}"}} + containerRuntimeVersion: {{print "{{ or $nodeInfo.containerRuntimeVersion $kwokVersion }}"}} + kernelVersion: {{print "{{ or $nodeInfo.kernelVersion $kwokVersion }}"}} + kubeProxyVersion: {{print "{{ or $nodeInfo.kubeProxyVersion $kwokVersion }}"}} + kubeletVersion: {{print "{{ or $nodeInfo.kubeletVersion $kwokVersion }}"}} + machineID: {{print "{{ or $nodeInfo.machineID \"\" }}"}} + operatingSystem: {{print "{{ or $nodeInfo.operatingSystem \"linux\" }}"}} + osImage: {{print "{{ or $nodeInfo.osImage \"\" }}"}} + systemUUID: {{print "{{ or $nodeInfo.systemUUID \"\" }}"}} + phase: Running + resourceRef: + apiGroup: v1 + kind: Node + selector: + matchExpressions: + - key: .status.conditions.[] | select( .type == "Ready" ) | .status + operator: NotIn + values: + - "True" \ No newline at end of file diff --git a/clusterloader2/pkg/dependency/kwok/dra/manifests/stage_fast_pod_delete.yaml b/clusterloader2/pkg/dependency/kwok/dra/manifests/stage_fast_pod_delete.yaml new file mode 100644 index 0000000000..aca91a0efd --- /dev/null +++ b/clusterloader2/pkg/dependency/kwok/dra/manifests/stage_fast_pod_delete.yaml @@ -0,0 +1,17 @@ +--- +apiVersion: kwok.x-k8s.io/v1alpha1 +kind: Stage +metadata: + name: pod-delete +spec: + next: + delete: true + finalizers: + empty: true + resourceRef: + apiGroup: v1 + kind: Pod + selector: + matchExpressions: + - key: .metadata.deletionTimestamp + operator: Exists \ No newline at end of file diff --git a/clusterloader2/pkg/dependency/kwok/dra/manifests/stage_fast_pod_ready.yaml b/clusterloader2/pkg/dependency/kwok/dra/manifests/stage_fast_pod_ready.yaml new file mode 100644 index 0000000000..bc994d85e9 --- /dev/null +++ b/clusterloader2/pkg/dependency/kwok/dra/manifests/stage_fast_pod_ready.yaml @@ -0,0 +1,71 @@ +--- +apiVersion: kwok.x-k8s.io/v1alpha1 +kind: Stage +metadata: + name: pod-ready +spec: + next: + statusTemplate: | + {{print "{{ $now := Now }}"}} + + conditions: + - lastTransitionTime: {{print "{{ $now | Quote }}"}} + status: "True" + type: Initialized + - lastTransitionTime: {{print "{{ $now | Quote }}"}} + status: "True" + type: Ready + - lastTransitionTime: {{print "{{ $now | Quote }}"}} + status: "True" + type: ContainersReady + {{print "{{ range .spec.readinessGates }}"}} + - lastTransitionTime: {{print "{{ $now | Quote }}"}} + status: "True" + type: {{print "{{ .conditionType | Quote }}"}} + {{print "{{ end }}"}} + + containerStatuses: + {{print "{{ range .spec.containers }}"}} + - image: {{print "{{ .image | Quote }}"}} + name: {{print "{{ .name | Quote }}"}} + ready: true + restartCount: 0 + state: + running: + startedAt: {{print "{{ $now | Quote }}"}} + {{print "{{ end }}"}} + + initContainerStatuses: + {{print "{{ range .spec.initContainers }}"}} + - image: {{print "{{ .image | Quote }}"}} + name: {{print "{{ .name | Quote }}"}} + ready: true + restartCount: 0 + {{print "{{ if eq .restartPolicy \"Always\" }}"}} + started: true + state: + running: + startedAt: {{print "{{ $now | Quote }}"}} + {{print "{{ else }}"}} + state: + terminated: + exitCode: 0 + finishedAt: {{print "{{ $now | Quote }}"}} + reason: Completed + startedAt: {{print "{{ $now | Quote }}"}} + {{print "{{ end }}"}} + {{print "{{ end }}"}} + + hostIP: {{print "{{ NodeIPWith .spec.nodeName | Quote }}"}} + podIP: {{print "{{ PodIPWith .spec.nodeName ( or .spec.hostNetwork false ) ( or .metadata.uid \"\" ) ( or .metadata.name \"\" ) ( or .metadata.namespace \"\" ) | Quote }}"}} + phase: Running + startTime: {{print "{{ $now | Quote }}"}} + resourceRef: + apiGroup: v1 + kind: Pod + selector: + matchExpressions: + - key: .metadata.deletionTimestamp + operator: DoesNotExist + - key: .status.podIP + operator: DoesNotExist diff --git a/clusterloader2/pkg/dependency/kwok/examples/kwok-gpu-job.yaml b/clusterloader2/pkg/dependency/kwok/examples/kwok-gpu-job.yaml new file mode 100644 index 0000000000..9ffebab4aa --- /dev/null +++ b/clusterloader2/pkg/dependency/kwok/examples/kwok-gpu-job.yaml @@ -0,0 +1,35 @@ +apiVersion: batch/v1 +kind: Job +metadata: + name: {{.Name}} + labels: + group: kwok-gpu-job + job-type: short-lived +spec: + parallelism: {{.Replicas}} + completions: {{.Replicas}} + completionMode: {{.Mode}} + activeDeadlineSeconds: 3600 # 1 hour + template: + metadata: + labels: + group: kwok-gpu-pod + job-type: short-lived + spec: + restartPolicy: Never + tolerations: + - key: kwok.x-k8s.io/node + operator: Equal + value: fake + effect: NoSchedule + containers: + - name: {{.Name}} + image: gcr.io/k8s-staging-perf-tests/sleep:v0.0.3 + args: + - {{.Sleep}} + resources: + claims: + - name: gpu + resourceClaims: + - name: gpu + resourceClaimTemplateName: kwok-gpu-claim-template-0 \ No newline at end of file diff --git a/clusterloader2/pkg/dependency/kwok/examples/kwok-gpu-resource-claim-template.yaml b/clusterloader2/pkg/dependency/kwok/examples/kwok-gpu-resource-claim-template.yaml new file mode 100644 index 0000000000..83036c262c --- /dev/null +++ b/clusterloader2/pkg/dependency/kwok/examples/kwok-gpu-resource-claim-template.yaml @@ -0,0 +1,12 @@ +apiVersion: resource.k8s.io/v1beta2 +kind: ResourceClaimTemplate +metadata: + name: kwok-gpu-claim-template +spec: + spec: + devices: + requests: + - name: gpu + exactly: + deviceClassName: cl2-gpu.kwok.x-k8s.io + count: 1 \ No newline at end of file diff --git a/clusterloader2/pkg/dependency/kwok/examples/test-config.yaml b/clusterloader2/pkg/dependency/kwok/examples/test-config.yaml new file mode 100644 index 0000000000..3056808ad1 --- /dev/null +++ b/clusterloader2/pkg/dependency/kwok/examples/test-config.yaml @@ -0,0 +1,62 @@ +# test-config.yaml +name: kwok-dra-test + +tuningSets: +- name: gpu-job-creation + qpsLoad: + qps: 5 + +dependencies: +- name: Install KWOK DRA for test + Method: DRAKWOKDriver + Params: + nodes: 3 + gpusPerNode: 8 + Timeout: 5m + +steps: +- name: Start measurements + measurements: + - Identifier: WaitForControlledPodsRunning + Method: WaitForControlledPodsRunning + Params: + action: start + apiVersion: batch/v1 + kind: Job + labelSelector: job-type = short-lived + operationTimeout: 120s + +- name: Create GPU ResourceClaimTemplate + phases: + - namespaceRange: + min: 1 + max: 1 + replicasPerNamespace: 1 + tuningSet: gpu-job-creation + objectBundle: + - basename: kwok-gpu-claim-template + objectTemplatePath: "kwok-gpu-resource-claim-template.yaml" + +- name: Create KWOK GPU jobs + phases: + - namespaceRange: + min: 1 + max: 1 + replicasPerNamespace: 10 + tuningSet: gpu-job-creation + objectBundle: + - basename: kwok-gpu-job + objectTemplatePath: "kwok-gpu-job.yaml" + templateFillMap: + Replicas: 1 + Mode: "Indexed" + Sleep: "300s" # 5 minute sleep + +- name: Wait for GPU jobs to be running + measurements: + - Identifier: WaitForControlledPodsRunning + Method: WaitForControlledPodsRunning + Params: + action: gather + labelSelector: job-type = short-lived + timeout: 15m \ No newline at end of file