Skip to content

Conversation

L3n41c
Copy link
Member

@L3n41c L3n41c commented May 23, 2025

What does this PR do?

Add an datadog agent parameter to configure custom resources metrics collection introduced in DataDog/datadog-agent#31715.

Motivation

Additional Notes

This the datadog-operator equivalent of DataDog/helm-charts#1883.

Minimum Agent Versions

Are there minimum versions of the Datadog Agent and/or Cluster Agent required?

  • Agent: v7.63.0 (if kubernetes-state-core is running on cluster checks runners)
  • Cluster Agent: v7.63.0 (otherwise)

Describe your test plan

Create a DatadogAgent CR with the following piece of KSM configuration:

spec:
  features:
    kubeStateMetricsCore:
      collectCustomResources:
      - groupVersionKind:
          group: datadoghq.com
          kind: DatadogAgent
          version: v2alpha1
        metrics:
        - each:
            gauge:
              path:
              - status
              - agent
              - upToDate
            type: gauge
          help: Number of up-to-date agents
          name: uptodateagents

And validate that the corresponding custom resource metric is properly emitted:

$ kubectl exec pod/dda-with-operator-cluster-agent-79988798b9-9t7f7 -c cluster-agent -- env -u DD_LOG_LEVEL agent check -r kubernetes_state_core --table
[…]
  kubernetes_state_customresource.uptodateagents             gauge  1755816713  1             kube_cluster_name:lenaic-kind, customresource_group:datadoghq.com, customresource_kind:DatadogAgent, customresource_version:v2alpha1, orch_cluster_id:596fb22b-2124-4b0a-91a9-dca1b37078f8

Checklist

  • PR has at least one valid label: bug, enhancement, refactoring, documentation, tooling, and/or dependencies
  • PR has a milestone or the qa/skip-qa label

@L3n41c L3n41c added this to the v1.16.0 milestone May 23, 2025
@L3n41c L3n41c added the enhancement New feature or request label May 23, 2025
@L3n41c L3n41c force-pushed the lenaic/CONTINT-4643 branch from 0bd388f to d0de66f Compare May 23, 2025 14:38
@L3n41c L3n41c force-pushed the lenaic/CONTINT-4643 branch from d0de66f to 16e29f7 Compare May 23, 2025 16:06
@codecov-commenter
Copy link

codecov-commenter commented May 26, 2025

Codecov Report

❌ Patch coverage is 92.00000% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 39.10%. Comparing base (977f372) to head (1188d65).

Files with missing lines Patch % Lines
...agent/feature/kubernetesstatecore/indent_writer.go 76.00% 4 Missing and 2 partials ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1927      +/-   ##
==========================================
+ Coverage   38.97%   39.10%   +0.13%     
==========================================
  Files         253      254       +1     
  Lines       26208    26275      +67     
==========================================
+ Hits        10214    10275      +61     
- Misses      15386    15390       +4     
- Partials      608      610       +2     
Flag Coverage Δ
unittests 39.10% <92.00%> (+0.13%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...adogagent/feature/kubernetesstatecore/configmap.go 96.20% <100.00%> (+0.96%) ⬆️
...atadogagent/feature/kubernetesstatecore/feature.go 71.52% <100.00%> (+0.40%) ⬆️
...r/datadogagent/feature/kubernetesstatecore/rbac.go 100.00% <100.00%> (ø)
...agent/feature/kubernetesstatecore/indent_writer.go 76.00% <76.00%> (ø)

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 977f372...1188d65. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@L3n41c L3n41c force-pushed the lenaic/CONTINT-4643 branch from bef05e6 to 03a75b1 Compare June 2, 2025 15:14
@levan-m levan-m modified the milestones: v1.16.0, v1.17.0 Jun 6, 2025
@levan-m levan-m modified the milestones: v1.17.0, v1.18.0 Jul 9, 2025
@levan-m levan-m modified the milestones: v1.18.0, v1.19.0 Aug 13, 2025
@L3n41c L3n41c force-pushed the lenaic/CONTINT-4643 branch 2 times, most recently from d2757df to ac5089f Compare August 21, 2025 22:16
@L3n41c L3n41c force-pushed the lenaic/CONTINT-4643 branch from ac5089f to 654136f Compare August 21, 2025 22:20
@L3n41c L3n41c marked this pull request as ready for review August 21, 2025 22:55
@L3n41c L3n41c requested review from a team as code owners August 21, 2025 22:55
@L3n41c L3n41c requested a review from Copilot August 21, 2025 22:55
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for configuring custom resource metrics collection in the Kubernetes State Metrics (KSM) core check. It introduces a new collectCustomResources field to the DatadogAgent specification, allowing users to define custom resources and their associated metrics that should be collected by the kube-state-metrics core check.

Key changes include:

  • Addition of new API types for custom resource configuration including Resource, Generator, and Metric types
  • Implementation of RBAC permissions generation for custom resources
  • Integration of custom resource configuration into the KSM check configuration YAML
  • Comprehensive test coverage for the new functionality

Reviewed Changes

Copilot reviewed 22 out of 23 changed files in this pull request and generated no comments.

Show a summary per file
File Description
api/datadoghq/v2alpha1/datadogagent_types.go Defines new API types for custom resource metrics configuration
internal/controller/datadogagent/feature/kubernetesstatecore/feature.go Integrates custom resource collection into the KSM feature
internal/controller/datadogagent/feature/kubernetesstatecore/rbac.go Generates RBAC permissions for custom resources
internal/controller/datadogagent/feature/kubernetesstatecore/configmap.go Updates KSM configuration to include custom resources
internal/controller/datadogagent/feature/kubernetesstatecore/indent_writer.go Utility for proper YAML indentation
test/e2e/manifests/datadog-agent-ccr-enabled.yaml E2E test manifest with custom resource configuration
test/e2e/tests/k8s_suite/k8s_suite_test.go E2E test validation for custom resource metrics

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

// +optional
Conf *CustomConfig `json:"conf,omitempty"`

// `CollectCustomResources` defines custom resources for the kube-state-metrics core check to collect.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add config instructions for []resource in this description so that it'll be populated in the docs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve added a link to the documentation of the details of the fields of this structure here: 28c60bd#diff-bb61879086cb5d8e4bd2d033770ca5943d58d914262b1c1f6cab53ad5a1b263a.

// `CollectCustomResources` defines custom resources for the kube-state-metrics core check to collect.
// +optional
// +listType=atomic
CollectCustomResources []Resource `json:"collectCustomResources,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are investing in simplifing migration could we align configs as closely as possible with helm?
https://github.com/DataDog/helm-charts/blob/ff7cf3fbee4c440bc493a105fae27a2e2c1c3958/charts/datadog/values.yaml#L178-L185

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve renamed the field to make it match the Helm configuration in 28c60bd.

Comment on lines 838 to 843
// GroupVersionKind is the Kubernetes group, version, and kind of a resource.
type GroupVersionKind struct {
Group string `json:"group" yaml:"group"`
Version string `json:"version" yaml:"version"`
Kind string `json:"kind" yaml:"kind"`
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this struct be imported from kube source code ?
it will never change but for simplicity ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the definition of GroupVersionKind in apimachinery:
https://github.com/kubernetes/apimachinery/blob/268a6d0fb19c92c9665e8f5cd85b564557038dc1/pkg/apis/meta/v1/group_version.go#L82-L90.
Unfortunately, it misses the yaml tags that we need here to embed a GroupVersionKind in a custom resource.

Comment on lines +143 to +147
// Use the resource plural if specified, otherwise derive it from the Kind
resourceName := cr.ResourcePlural
if resourceName == "" {
resourceName = strings.ToLower(flect.Pluralize(cr.GroupVersionKind.Kind))
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we force to lower the plural name we build here, should we force the plural name we read from the custom resource too ? is it always in lowercase by rune from kubernetes or something ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comes from the core Kubernetes concepts.

The GVK is what we find inside resource manifests. The kind is the object type. It’s capitalized and singular.
For ex., a resource contains:

apiVersion: apps/v1
kind: Deployment

The GVR is what is used in RESTful API path. The resource is lower cased and plural.
For ex., the API to operate on the above resource is:

/apis/apps/v1/namespaces/{ns}/deployments

The logic here is the same as the one implemented in kube-state-metrics upstream: https://github.com/kubernetes/kube-state-metrics/blob/e332175940b8d8f76648c24d79c1e0d59a9b7926/pkg/customresourcestate/config.go#L82-L90

@levan-m levan-m modified the milestones: v1.19.0, v1.20.0 Sep 24, 2025
@L3n41c
Copy link
Member Author

L3n41c commented Oct 13, 2025

/merge

@dd-devflow-routing-codex
Copy link

dd-devflow-routing-codex bot commented Oct 13, 2025

View all feedbacks in Devflow UI.

2025-10-13 12:03:09 UTC ℹ️ Start processing command /merge


2025-10-13 12:03:15 UTC ℹ️ MergeQueue: pull request added to the queue

The expected merge time in main is approximately 0s (p90).


2025-10-13 14:03:41 UTCMergeQueue: The build pipeline has timeout

The merge request has been interrupted because the build 79133870 took longer than expected. The current limit for the base branch 'main' is 120 minutes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants