Sharded String Interner

A high-performance sharded string interning system with low-contention concurrency design, custom memory resources, and chunk-based allocation strategies. This project is part of the EXOTIC::memory / EXOTIC::intern ecosystem.

Overview

This project was originally intended to be integrated into a compiler, but it remains a fully generic system that can be used in other performance-critical environments.

It took approximately three weeks to complete during an end-of-term period, from initial design to full implementation. This work has been extremely beneficial in strengthening my understanding of low-level systems, memory management, and multithreaded programming.

Before this project, my knowledge of multithreading was very limited. It served as a foundational step toward understanding professional systems programming concepts such as contention control, memory ownership, concurrent access patterns, and low-level allocation strategies.

Furthermore, it's in this project that I implemented my personal allocation system inspired by std::pmr. In fact, I have a repository that deals with it!

Architecture

At its core, this project implements a sharded string interning system designed to reduce contention as much as possible while keeping lookups fast and lightweight.

Strings are distributed across shards using a hash-based sharding function. Each shard is completely independent and contains:

An unordered_set<std::string_view> used as the intern registry
Its own unsynchronized_chunk_allocator
A dedicated shared_mutex

The idea behind the design is fairly simple: most operations are reads, so the synchronization model is optimized around that.

Shared locking is used during lookups
Exclusive locking only happens when a new string must be inserted

This allows already-interned strings to be retrieved extremely quickly with very little contention.

Lookup Flow

When a string enters a shard, the system first checks if it already exists inside the shard registry under a shared lock.

If the string is already interned:

The existing string_view is retrieved
Its .data() pointer is returned directly
No allocation is performed
No exclusive locking is required

In practice, repeated lookups become extremely cheap since they mostly stay inside the shared-lock path.

Allocation Flow

If the string is not found during lookup, the system switches to an exclusive lock.

At this stage:

Memory is requested from the shard's unsynchronized_chunk_allocator
The allocator itself acquires memory from a shared upstream atomic memory resource
The string is copied into the allocated memory region
A new std::string_view pointing to this stable memory is inserted into the registry
The resulting pointer is returned

Once interned, future lookups will directly reuse the same pointer without allocating again.

Memory Model

Each shard owns its own chunk allocator, but all allocators ultimately share the same upstream atomic memory resource.

The goal here was to keep shard logic local while still centralizing the actual memory acquisition in a concurrency-safe way.

This design provides:

Local allocation behavior per shard
Thread-safe upstream memory acquisition
Stable string lifetime guarantees
Reduced allocation overhead through chunk-based allocation

The atomic memory resource essentially acts as a global pool from which all shards acquire their memory ranges.

Example Usage

exotic::memory::monotonic_atomic_buffer upstream(1 << 20);
exotic::memory::unsynchronized_chunk_allocator<char> alloc(&upstream);

exotic::intern::ShardedStringInterner<8> interner(&alloc); // 8 shards

const char* a = interner.intern("hello");
const char* b = interner.intern("hello"); // same pointer

Notes

This project is intentionally designed as a learning-focused system rather than a drop-in standard library replacement. That’s how I learn, and it seems to work very well.

License

This project is licensed under the Boost Software License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
include/exotic		include/exotic
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sharded String Interner

Overview

Architecture

Lookup Flow

Allocation Flow

Memory Model

Example Usage

Notes

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Sharded String Interner

Overview

Architecture

Lookup Flow

Allocation Flow

Memory Model

Example Usage

Notes

License

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages