Skip to content

Conversation

@SungJin1212
Copy link
Member

@SungJin1212 SungJin1212 commented Oct 15, 2025

Add a -name-validation-scheme flag for supporting UTF-8 on Cortex
We offer two encoding ways, legacy or utf8.
The utf8 mode enables UTF-8 encoding for validating label names and metric names.

  • Distributor: Validate the series according to UTF-8 validation logic.
  • Ruler: Accept rule groups that contain UTF-8 labels
  • Alertmanager: The API allows UTF-8 labels.

Which issue(s) this PR fixes:
Fixes #

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@SungJin1212 SungJin1212 force-pushed the Add-changelog-for-utf8-support branch 2 times, most recently from 7339ddb to 055e6ee Compare October 16, 2025 02:42
require.Equal(t, 200, res.StatusCode)
require.Equal(t, model.ValVector, queryResult.Type())
vec := queryResult.(model.Vector)
require.Equal(t, 1, len(vec))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should try to cover label names and label values API as well. But we can do it in next PR

Copy link
Member Author

@SungJin1212 SungJin1212 Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we remove https://github.com/cortexproject/cortex/blob/master/pkg/cortex/configinit/init.go as we start accepting UTF-8?

I've noticed the labelValues api with UTF-8 label name fails since we currently set the legacy scheme as default.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should. I think we might be missing some test cases so we didn't find this in previous PR. Any components that do label/metric validation needs to accept the validation scheme otherwise you modify the global validation scheme variable

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed it and added label names/values e2e test case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are missing more test cases like alertmanager and even ruler configurations with UTF 8 labels.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added commits, testing the UTF-8 labels for Ruler and Alertmanager in the e2e test

rulerFlags := mergeFlags(
BlocksStorageFlags(),
RulerFlags(),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for this test case.


silenceId, err := c.CreateSilence(ctx, types.Silence{
Matchers: amlabels.Matchers{
{Name: "silences.name.🙂", Value: "silences.value.🙂"},
Copy link
Contributor

@yeya24 yeya24 Oct 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did this test even succeed. We don't even add config to enable UTF 8 labels. Does it mean that AM support UTF 8 labels by default? but I think this is wrong and a behavior change after we remove /pkg/cortex/configinit/init.go. The UTF 8 validation should be properly controlled by the new runtime config.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Alertmanager is able to control which validation mode to use via InitFromFlags. https://github.com/prometheus/alertmanager/blob/main/matcher/compat/parse.go#L59C6-L59C19.

However, this is not per tenant and global. That's why I was asking if per tenant metric name validation scheme really works or not. Seems we need some code changes to make that configuration not global in alertmanager.

Also you may add more utf-8 configs to route's matchers? Adding it to group_by alone Idk if it is sufficient.

Copy link
Member Author

@SungJin1212 SungJin1212 Oct 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Alertmanager Cortex uses this code, so the test succeeds since the default value of the scheme is UTF-8.

// isValidClassicLabelName returns true if the string is a valid classic label name.
func isValidClassicLabelName(_ *slog.Logger) func(model.LabelName) bool {
	return func(name model.LabelName) bool {
		return name.IsValid()
	}
}

The upstream Alertmanager code is

func isValidClassicLabelName(_ *slog.Logger) func(model.LabelName) bool {
	return func(name model.LabelName) bool {
		return model.LegacyValidation.IsValidLabelName(string(name))
	}
}

So, we need to expose another UTF-8 config for the Alertmanager.

That's why I was asking if per per-tenant metric name validation scheme really works or not.

It seems we should apply a per-tenant scheme for external label validation in the Ruler and Distributor, but the configuration for the Alertmanager should be applied globally. WDYT?

@SungJin1212 SungJin1212 force-pushed the Add-changelog-for-utf8-support branch 6 times, most recently from 76b0dbb to 5e6abe6 Compare October 22, 2025 09:29
@SungJin1212 SungJin1212 changed the title Add-changelog-for-utf8-support Support UTF-8 Oct 23, 2025
@SungJin1212 SungJin1212 force-pushed the Add-changelog-for-utf8-support branch from 5e6abe6 to 8bd64bb Compare October 23, 2025 01:57
@SungJin1212 SungJin1212 force-pushed the Add-changelog-for-utf8-support branch from 69a8130 to 0f8f7fa Compare October 23, 2025 08:05
@yeya24 yeya24 changed the title Support UTF-8 Support UTF-8 name Oct 23, 2025
Copy link
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@yeya24 yeya24 merged commit 6d3700e into cortexproject:master Oct 23, 2025
33 of 34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants