ICU-20392 Split the Locale payload into nested and heap allocated #3518

roubert · 2025-06-06T12:21:04Z

All the most commonly used Locale objects have very little payload, most of them don't use any extensions, don't use a language tag longer than 3 characters and don't use more than a single variant.

There's room for all that data in a simple 32 byte large payload object, which can be nested directly in the Locale object.

Any payload larger than that can instead be heap allocated as needed, in order to save storage for the most commonly used objects while retaining the ability to create arbitrarily large and complex Locale objects.

This reduces the storage requirements for all Locale objects.

For nested payloads, this reduction is from 224 bytes to 48 bytes.

For payloads that need to be heap allocated, the reduction depends on several factors, but for most cases there's some reduction. There are also cases where this refactoring actually increases the storage used, because CharString allocates more storage than necessary. There are a number of ways in which this could be improved upon, such as optimizing CharString to not allocate more than necessary when copying a string of known length, not allocating any empty CharString objects or possibly replacing CharString with a new class for fixed length strings.

The public API remains unchanged but the operations which can lead to U_MEMORY_ALLOCATION_ERROR change.

Checklist

Required: Issue filed: ICU-20392
Required: The PR title must be prefixed with a JIRA Issue number. Example: "ICU-1234 Fix xyz"
Required: Each commit message must be prefixed with a JIRA Issue number. Example: "ICU-1234 Fix xyz"
Issue accepted (done by Technical Committee after discussion)
Tests included, if applicable
API docs and/or User Guide docs changed or added, if applicable

ALLOW_MANY_COMMITS=true

markusicu

Very nice!

Except, I was hoping for sizeof(Locale) to go down even more.

I think with these changes you are getting 48 bytes on a 64-bit machine for

vtable pointer
variant payload
- variant discriminator
- union of Nest / unique_ptr, 32B and 8B-aligned for the pointer

This makes the variant discriminator take up 8 bytes (because of the alignment/padding), distinguishing three states (bogus, Nest, unique_ptr).

Idea: Replace the std::variant with an explicit/manual one, using a union for the heap pointer vs. some other fields.

enum Which : uint8_t { BOGUS, NEST, HEAP };
struct Payload {
    union {
        struct {
            char language[4];
            char region[4];
        } langRegion;
        std::unique_ptr<Heap> heapPtr;
    } langRegionOrHeap;
    Which which : 2;
    uint8_t variantBegin : 6;  // 6 to fill the byte, 5 would suffice
    char script[5];
    char baseName[18];

    Payload() {
        which = BOGUS;
    }
    ~Payload() {
        // if which == HEAP: release langRegionOrHeap.heapPtr
    }
    ...
};

That should get us to 8B vtable + 32B payload = 40B.

I think that shaving off another 8B in every Locale instance is worth some localized fiddling with the discriminator & union.

WDYT?

icu4c/source/common/locid.cpp

roubert · 2025-08-06T14:44:34Z

For the first step here, I'd very much like to use the well-tested and type-safe standard library std::variant<>, as that means less new code that needs to be written, adding no risk of introducing any additional bugs.

The next most important optimization after that would then be to eliminate the storage currently wasted by CharString allocating more than needed for strings of known size.

With that done, these changes will have resulted in significant size reductions for all kinds of Locale objects, finally making them small enough to use everywhere and making it possible to then start removing workarounds (such as ICU-23005) that currently use different kinds of strings instead of using Locale objects directly (saving some storage at the cost of having to repeatedly re-parse these strings).

At that point the primary problem will have been fully solved and then we can start looking into further improvements. I myself would then like to start out by taking a critical look at Locale::initBaseName(), which seems overly convoluted and maybe could be eliminated altogether by some thoughtful refactoring. And then, sure, replacing std::variant<> with something specialized could both save a few more bytes and be quite interesting to implement.

markusicu · 2025-08-06T20:53:04Z

For the first step here, I'd very much like to use the well-tested and type-safe standard library std::variant<>, as that means less new code that needs to be written, adding no risk of introducing any additional bugs.

Ok for a first step / first PR.

The next most important optimization after that would then be to eliminate the storage currently wasted by CharString allocating more than needed for strings of known size.

I partially disagree. I think that making the Locale object smaller for commonly used locale IDs is most important.
I am ok if that's in a second PR.

CharString is "only" used for less-common locale IDs. sizeof(Heap) should be reduced after sizeof(Locale).

I agree that then there are other opportunities, such as making the parser/initBaseName() less convoluted, and even storing keywords in an at least slightly structured way, so that we need not traipse through the string all the time.

All the most commonly used Locale objects have very little payload, most of them don't use any extensions, don't use a language tag longer than 3 characters and don't use more than a single variant. There's room for all that data in a simple 32 byte large payload object, which can be nested directly in the Locale object. Any payload larger than that can instead be heap allocated as needed, in order to save storage for the most commonly used objects while retaining the ability to create arbitrarily large and complex Locale objects. This reduces the storage requirements for all Locale objects. For nested payloads, this reduction is from 224 bytes to 48 bytes. For payloads that need to be heap allocated, the reduction depends on several factors, but for most cases there's some reduction. There are also cases where this refactoring actually increases the storage used, because CharString allocates more storage than necessary. There are a number of ways in which this could be improved upon, such as optimizing CharString to not allocate more than necessary when copying a string of known length, not allocating any empty CharString objects or possibly replacing CharString with a new class for fixed length strings. The public API remains unchanged but the operations which can lead to U_MEMORY_ALLOCATION_ERROR change.

jira-pull-request-webhook · 2025-08-07T17:42:56Z

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

roubert force-pushed the 20392 branch 12 times, most recently from e348eb0 to 33362b4 Compare June 10, 2025 14:40

unicode-org deleted a comment from jira-pull-request-webhook bot Jul 22, 2025

roubert force-pushed the 20392 branch from 33362b4 to 2d52f77 Compare July 22, 2025 13:12

unicode-org deleted a comment from jira-pull-request-webhook bot Jul 22, 2025

roubert marked this pull request as ready for review July 22, 2025 13:41

roubert requested a review from markusicu July 22, 2025 13:41

roubert assigned markusicu Jul 22, 2025

markusicu reviewed Aug 5, 2025

View reviewed changes

icu4c/source/common/locid.cpp Outdated Show resolved Hide resolved

icu4c/source/common/locid.cpp Outdated Show resolved Hide resolved

icu4c/source/common/locid.cpp Show resolved Hide resolved

markusicu approved these changes Aug 7, 2025

View reviewed changes

roubert added 3 commits August 7, 2025 19:42

ICU-20392 Use getName() instead of private member fullName directly.

c116d22

ICU-20392 Use isBogus() instead of private member fIsBogus directly.

207c7b1

roubert force-pushed the 20392 branch from d74cc73 to 166616f Compare August 7, 2025 17:42

roubert merged commit 00c199b into unicode-org:main Aug 7, 2025
94 checks passed

roubert deleted the 20392 branch August 7, 2025 18:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ICU-20392 Split the Locale payload into nested and heap allocated #3518

ICU-20392 Split the Locale payload into nested and heap allocated #3518

Uh oh!

roubert commented Jun 6, 2025

Uh oh!

markusicu left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

roubert commented Aug 6, 2025 •

edited

Loading

Uh oh!

markusicu commented Aug 6, 2025

Uh oh!

jira-pull-request-webhook bot commented Aug 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

ICU-20392 Split the Locale payload into nested and heap allocated #3518

ICU-20392 Split the Locale payload into nested and heap allocated #3518

Uh oh!

Conversation

roubert commented Jun 6, 2025

Checklist

Uh oh!

markusicu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

roubert commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

markusicu commented Aug 6, 2025

Uh oh!

jira-pull-request-webhook bot commented Aug 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

roubert commented Aug 6, 2025 •

edited

Loading