-
Notifications
You must be signed in to change notification settings - Fork 768
feat: Add package registry to eck #8800
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🎉 Snyk checks have passed. No issues have been found so far.✅ security/snyk check is complete. No issues have been found. (View Details) ✅ license/snyk check is complete. No issues have been found. (View Details) |
🔍 Preview links for changed docs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took a quick look from the side of the team maintaining Package Registry.
It looks great, thanks for adding support for package registry in ECK, this will help many users.
Added some comments, please let us know if you need a more in-depth review from our side.
AgentImage Image = "elastic-agent/elastic-agent" | ||
MapsImage Image = "elastic-maps-service/elastic-maps-server" | ||
LogstashImage Image = "logstash/logstash" | ||
PackageRegistryImage Image = "package-registry/distribution" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an image used by default, or setting the image is required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes the default version is package-registry/distribution:<version>
where the user specifies the version. Version is a required field so if they do not specify one it will fail. Something I thought about was adding a epr_type
or something like that where the user could specify the different EPR versions we publish. Like 9.1.2, lite-9.1.2, production, and lite. However I think its just as easy to specify the image in the template if you want something other than package-registry/distribution:<version>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While doing a first test I just realized that the default image is around ~5Gb 😅 (I was wondering why my container was not starting 🙃 ). I'm also wondering if it would not make sense to have short flag to select the image "type".
Edit: I just had a Pod that failed to start with the following error on GKE:
Warning Failed 90s kubelet Failed to pull image "docker.elastic.co/package-registry/distribution:9.1.0": failed to pull and unpack image "docker.elastic.co/package-registry/distribution:9.1.0": failed to extract layer sha256:0f0888ef6ac576c67e3a9acf8ec7216533b7f3144aeb14c9b93d0db9469830cd: write /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/889/fs/packages/package-storage/security_detection_engine-9.0.9-beta.1.zip: no space left on device: unknown
I also had disk pressure conditions. I think the image size should be highlighted in the documentation so that K8s nodes can handle it.
Edit 2: This maybe also means that we need to check the disk size on the nodes used for our e2e tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jsoriano do you think we should change the default image to be something smaller? I selected this image because it is what we recommend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the lite images were added as smaller images for this kind of use cases, but even this image is starting to be too big.
There are vanilla images that don't contain any package. They fail to start if no directory with packages is configured, but they can also be configured in proxy mode, to forward requests for example to the public EPR.
Latest of these images is docker.elastic.co/package-registry/package-registry:v1.31.1
.
There is an open issue about allowing to start even if no package is available yet: elastic/package-registry#1351.
More about the proxy mode in https://github.com/elastic/package-registry/?tab=readme-ov-file#proxy-mode.
We also have a WIP to create custom distributions, and smaller images elastic/package-registry#1335.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds Elastic Package Registry (EPR) support to ECK, introducing a new CRD for deploying EPR instances and enabling Kibana to reference EPR instances for Fleet package management.
- Adds
ElasticPackageRegistry
CRD with controller to manage EPR deployments - Enables Kibana to associate with EPR instances via
packageRegistryRef
field - Implements TLS certificate handling and CA mounting for secure communication between Kibana and EPR
Reviewed Changes
Copilot reviewed 60 out of 61 changed files in this pull request and generated 3 comments.
Show a summary per file
File | Description |
---|---|
pkg/apis/epr/v1alpha1/ |
New API definitions for ElasticPackageRegistry CRD |
pkg/controller/packageregistry/ |
Controller implementation for managing EPR resources |
pkg/controller/association/controller/kibana_epr.go |
Association controller for Kibana-EPR relationships |
pkg/apis/kibana/v1/kibana_types.go |
Adds packageRegistryRef field and EPR association support |
pkg/controller/kibana/ |
Updates Kibana controller to handle EPR associations and CA certificates |
test/e2e/ |
E2E tests for EPR functionality and associations |
Comments suppressed due to low confidence (1)
pkg/controller/kibana/pod_test.go:1
- The comment on line 67 says 'readinessProbe is the readiness probe for the maps container' but this function is in the packageregistry controller and should refer to the package registry container.
// Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
@tehbooom 👋 please let me know when you need another review, thanks! |
Added EPR to ECK diagnostics here. I still need to update our documentation. I know we changed how we do documentation so if you could please point me in the right direction where ECK docs live that would be great. This PR is ready for another review. Thanks! |
Documentation has moved here: https://github.com/elastic/docs-content Please note that the I'll try to take another look to your PR this week, I'm struggling to keep up with the pace of PRs opened in this repo 😅 |
buildkite test this -f p=gke,E2E_TAGS=epr |
buildkite test this -f p=gke,E2E_TAGS=epr |
E2E tests are still failing with:
We also have to find a solution for this. IIUC elastic/package-registry#1335 would be the way to go? I'll also try to check if we can increase the disk on GKE nodes, but it would still mean that we may have to skip other providers (AWS, Azure, Kind...). IIUC the image currently requires ~14Gi:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pods are being recreated in my dev env because of changes in packageregistry.k8s.elastic.co/config-hash
while I do nothing:
Edit: Nevermind, it's because I used short lived certificates to test certificate rotation, and it seems it requires the Pod to be recreated. @tehbooom Could you confirm that certificates are not hot reloaded?
if [[ -f "` + volume.EPRCACertPath + `" ]]; then | ||
echo "EPR CA certificate found, checking for user-provided CA bundle..." | ||
# Check if user provided their own NODE_EXTRA_CA_CERTS (from init container env) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make things easier to implement and debug if we ask the users to not use NODE_EXTRA_CA_CERTS
but instead allow them to add custom certificates using an alternate env. variable, for example KIBANA_EXTRA_CA_CERTS
(I'm asking because the way we handle NODE_EXTRA_CA_CERTS
feels a bit "hacky" to me tbh)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think one issue with that would be that there might be already users that we would break if we stopped supporting user defined NODE_EXTRA_CA_CERTS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think one issue with that would be that there might be already users that we would break if we stopped supporting user defined
NODE_EXTRA_CA_CERTS
Agreed. However, if we did use KIBANA_EXTRA_CA_CERTS
what would it make it easier to debug and implement? You are still checking if the user provided extra CAs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial look at this I failed to submit previously.
secretName: | ||
description: |- | ||
SecretName is the name of an existing Kubernetes secret that contains connection information for associating an | ||
Elastic resource not managed by the operator. | ||
The referenced secret must contain the following: | ||
- `url`: the URL to reach the Elastic resource | ||
- `username`: the username of the user to be authenticated to the Elastic resource | ||
- `password`: the password of the user to be authenticated to the Elastic resource | ||
- `ca.crt`: the CA certificate in PEM format (optional) | ||
- `api-key`: the key to authenticate against the Elastic resource instead of a username and password (supported only for `elasticsearchRefs` in AgentSpec and in BeatSpec) | ||
This field cannot be used in combination with the other fields name, namespace or serviceName. | ||
type: string | ||
serviceName: | ||
description: |- | ||
ServiceName is the name of an existing Kubernetes service which is used to make requests to the referenced | ||
object. It has to be in the same namespace as the referenced resource. If left empty, the default HTTP service of | ||
the referenced resource is used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
secretName is not needed. Is there a another way to do an association without commonv1.ObjectSelector
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is an acceptable compromise to use ObjectSelector here even if it has elements that do not apply. The alternative would be to introduce a custom struct and then translate to ObjectSelector in the functions required by the Association interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we okay with resolving this even though it doesnt need secretName
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approach looks reasonable with one exception: I am also not a big fan of of the NODE_EXTRA_CA_CERTS init container hackery but don't see a good alternative right now if we want to be backwards compatible.
Main blocker to merge this imo is the lack of UBI images for the package registry.
I could not get the recommended default image to work, as the k8s nodes in my environment ran out of ephemeral storage while trying to pull the image. I wonder how viable this image is for any environment? I was successful with the lite
image though.
deploy/eck-stack/charts/eck-package-registry/templates/epr.yaml
Outdated
Show resolved
Hide resolved
if [[ -f "` + volume.EPRCACertPath + `" ]]; then | ||
echo "EPR CA certificate found, checking for user-provided CA bundle..." | ||
# Check if user provided their own NODE_EXTRA_CA_CERTS (from init container env) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think one issue with that would be that there might be already users that we would break if we stopped supporting user defined NODE_EXTRA_CA_CERTS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ive update the test image to be the lite
image since others have stated it works with that image
############################# | ||
# Handle EPR CA certificate consolidation with user-provided NODE_EXTRA_CA_CERTS | ||
# This must run before the early exit to ensure certificates are always processed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember the scenario as its been sometime. But putting this after the check to see if it was already initialized would give me x509 errors. I think it was if I added EPR after Kibana was already deployed it wouldnt pick up the new certificate.
if [[ -f "` + volume.EPRCACertPath + `" ]]; then | ||
echo "EPR CA certificate found, checking for user-provided CA bundle..." | ||
# Check if user provided their own NODE_EXTRA_CA_CERTS (from init container env) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think one issue with that would be that there might be already users that we would break if we stopped supporting user defined
NODE_EXTRA_CA_CERTS
Agreed. However, if we did use KIBANA_EXTRA_CA_CERTS
what would it make it easier to debug and implement? You are still checking if the user provided extra CAs.
secretName: | ||
description: |- | ||
SecretName is the name of an existing Kubernetes secret that contains connection information for associating an | ||
Elastic resource not managed by the operator. | ||
The referenced secret must contain the following: | ||
- `url`: the URL to reach the Elastic resource | ||
- `username`: the username of the user to be authenticated to the Elastic resource | ||
- `password`: the password of the user to be authenticated to the Elastic resource | ||
- `ca.crt`: the CA certificate in PEM format (optional) | ||
- `api-key`: the key to authenticate against the Elastic resource instead of a username and password (supported only for `elasticsearchRefs` in AgentSpec and in BeatSpec) | ||
This field cannot be used in combination with the other fields name, namespace or serviceName. | ||
type: string | ||
serviceName: | ||
description: |- | ||
ServiceName is the name of an existing Kubernetes service which is used to make requests to the referenced | ||
object. It has to be in the same namespace as the referenced resource. If left empty, the default HTTP service of | ||
the referenced resource is used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
secretName is not needed. Is there a another way to do an association without commonv1.ObjectSelector
?
buildkite test this -f p=gke,E2E_TAGS=epr |
Elastic Package Registry (EPR) has been highly requested to be added to ECK.
EPR does not have any references since it does not require a license nor any other application.
The following was implemented for EPR
xpack.fleet.registryUrl
and set the environment variableNODE_EXTRA_CA_CERTS
to the path of EPR's CA which is mountedNODE_EXTRA_CA_CERTS
with a mount the controller will combine the certs appending the EPR's CA to the users specified CAThis was tested with and without setting
NODE_EXTRA_CA_CERTS
using the below manifest