atomix

Languages: English | 简体中文 | 日本語 | Español | Français

Atomic operations with explicit memory ordering for Go.

Overview

Go's sync/atomic provides atomic operations with sequential consistency. This library exposes C++11/C11 memory model orderings (Relaxed, Acquire, Release, AcqRel) through architecture-specific implementations.

import "code.hybscloud.com/atomix"

var counter atomix.Int64

// Method-based API with ordering suffix
counter.AddRelaxed(1)    // Relaxed: no synchronization
counter.Add(1)           // AcqRel: default RMW ordering

// Pointer-based API for raw memory
var flags int32
atomix.Relaxed.StoreInt32(&flags, 1)
val := atomix.Acquire.LoadInt32(&flags)

Installation

go get code.hybscloud.com/atomix

Requirements: Go 1.26+

Memory Ordering

The library implements four orderings from the C++11 memory model:

Ordering	Semantics
Relaxed	Atomicity only. No synchronization or ordering constraints.
Acquire	Subsequent reads/writes cannot be reordered before this load. Pairs with Release stores.
Release	Prior reads/writes cannot be reordered after this store. Pairs with Acquire loads.
AcqRel	Combines Acquire and Release semantics. For read-modify-write operations.

Ordering Selection

Default methods (no ordering suffix) use:

Load operations: Relaxed
Store operations: Relaxed
Read-modify-write operations: AcqRel

Note: sync/atomic operations are sequentially consistent. atomix defaults to Relaxed for Load and Store, which maps to different instructions on weakly-ordered architectures (e.g., LDR vs LDAR on ARM64). Use LoadAcquire/StoreRelease when an acquire/release synchronization edge is required.

Ordering Selection Matrix

Use Case	Ordering	Rationale
Statistics counters	Relaxed	No synchronization needed; eventual consistency acceptable
Reference counting	AcqRel	Ensures visibility of object state before deallocation
Producer-consumer flags	Release/Acquire	Producer releases data, consumer acquires
Spinlock acquire	Acquire	Critical section reads must see prior writes
Spinlock release	Release	Critical section writes must complete before unlock
Sequence locks	AcqRel	Both directions need ordering

Types

Value Types

Type	Size	Description
`Bool`	4 bytes	Atomic boolean (backed by uint32)
`Int32`, `Uint32`	4 bytes	32-bit integers
`Int64`, `Uint64`	8 bytes	64-bit integers
`Uintptr`	8 bytes	Pointer-sized integer
`Pointer[T]`	8 bytes	Generic atomic pointer
`Int128`, `Uint128`	16 bytes	128-bit integers (requires 16-byte alignment)

Padded Types

Padded variants (Int64Padded, Uint64Padded, etc.) occupy a full cache line (64 bytes) to prevent false sharing when multiple atomic variables are accessed by different CPU cores.

// Without padding: variables may share cache line, causing contention
var a, b atomix.Int64  // May be adjacent in memory

// With padding: each variable occupies its own cache line
var a, b atomix.Int64Padded  // 64-byte separation guaranteed

Operations

Operation	Returns	Description
`Load`	value	Atomic read
`Store`	—	Atomic write
`Swap`	old value	Atomic exchange
`CompareAndSwap`	bool	Returns true if exchange occurred
`CompareExchange`	old value	Returns previous value regardless of success
`Add`, `Sub`	new value	Atomic arithmetic
`Inc`, `Dec`	new value	Atomic increment/decrement by 1
`And`, `Or`, `Xor`	old value	Atomic bitwise operations
`Max`, `Min`	old value	Atomic maximum/minimum

Return value semantics: Wrapper Add/Sub/Inc/Dec and pointer-based 32/64/uintptr Add/Sub return the new value. Pointer-based MemoryOrder.AddInt128 and MemoryOrder.AddUint128 return the old value. Swap/And/Or/Xor/Max/Min return the old value.

CompareAndSwap vs CompareExchange

// CompareAndSwap: returns success/failure
if v.CompareAndSwap(old, new) {
    // Success
}

// CompareExchange: returns previous value (enables CAS loops without separate Load)
for {
    old := v.Load()
    new := transform(old)
    if v.CompareExchange(old, new) == old {
        break  // Success
    }
}

Pointer-Based API

For interoperation with memory-mapped regions, shared memory, or io_uring rings:

var flags int32

atomix.Relaxed.StoreInt32(&flags, 1)
val := atomix.Acquire.LoadInt32(&flags)
atomix.Release.CompareAndSwapInt32(&flags, 0, 1)

The pointer-based API operates on raw *int32, *int64, etc., rather than wrapper types. This is useful when atomic variables cannot use wrapper types (e.g., fields in kernel-shared structures).

128-bit Operations

128-bit atomics require 16-byte alignment. Use placement helpers for shared memory:

buf := make([]byte, 32)
_, ptr := atomix.PlaceAlignedUint128(buf, 0)
ptr.Store(lo, hi)

var v atomix.Uint128  // Ensure this variable is placed at a 16-byte aligned address
v.Store(lo, hi)

Architecture	128-bit Implementation
amd64	`LOCK CMPXCHG16B`
arm64	`LDXP/STXP` (default) or `CASP` (`-tags=lse2`)
riscv64, loong64	Low-word LR/SC on riscv64 or LL/SC on loong64; not true 128-bit atomicity

On riscv64 and loong64, 128-bit SwapAcquire, SwapRelease, and SwapAcqRel alias the relaxed low-word swap path; use a separate 32/64-bit synchronization primitive or external synchronization when acquire/release publication is required.

Note: 128-bit atomics are primarily useful for double-word CAS patterns (e.g., lock-free data structures with version counters).

Architecture Implementation

x86-64 (TSO)

x86-64 provides Total Store Ordering (TSO), a strong memory model where:

All loads have implicit acquire semantics
All stores have implicit release semantics
Store-load ordering requires explicit barrier (MFENCE) or locked instruction

Consequently, all ordering variants compile to identical machine code on x86-64. The primary role of explicit ordering on x86-64 is documentation and portability.

Operation	Instruction	Notes
Load	`MOV`	Plain memory access
Store	`MOV`	Plain memory access
Add	`LOCK XADD`	Instruction returns old operand; Add APIs that promise new values compute them after the instruction
Swap	`XCHG`	Implicit LOCK
CAS	`LOCK CMPXCHG`
And/Or/Xor	`LOCK CMPXCHG` loop	Returns old value via CAS loop
CAS128	`LOCK CMPXCHG16B`

Load and Store are implemented in pure Go for compiler inlining.

ARM64 (Weakly Ordered)

ARM64 has a weakly ordered memory model requiring explicit ordering instructions. atomix documents ARMv8.4+ as the ARM64 package baseline for LSE-backed 32/64-bit atomics. The default 128-bit LL/SC path is documented separately for ARMv8.1+. LSE (Large System Extensions) provides atomic instructions with ordering suffixes:

Suffix meanings: No suffix = Relaxed, A = Acquire, L = Release, AL = Acquire-Release

Operation	Relaxed	Acquire	Release	AcqRel
Load	`LDR`	`LDAR`	—	—
Store	`STR`	—	`STLR`	—
Add	`LDADD`	`LDADDA`	`LDADDL`	`LDADDAL`
CAS	`CAS`	`CASA`	`CASL`	`CASAL`
Swap	`SWP`	`SWPA`	`SWPL`	`SWPAL`
And	`LDCLR`†	`LDCLRA`	`LDCLRL`	`LDCLRAL`
Or	`LDSET`	`LDSETA`	`LDSETL`	`LDSETAL`
Xor	`LDEOR`	`LDEORA`	`LDEORL`	`LDEORAL`

† LDCLR clears bits (AND with complement). To implement And(mask), pass ~mask.

Relaxed load/store are implemented in pure Go for inlining. Other orderings use assembly with LSE instructions.

128-bit Operations

Build Tag	Instructions	Target Hardware
(default)	`LDXP/STXP` (LL/SC loop)	ARMv8.1+ LL/SC path
`-tags=lse2`	`CASP` (single instruction)	ARMv8.4+ with LSE2

LL/SC (Load-Link/Store-Conditional) retries on contention and is documented for ARMv8.1+. CASP provides single-instruction atomicity on ARMv8.4+ with LSE2.

RISC-V 64-bit

RISC-V RVWMO (Weak Memory Ordering) uses explicit fence instructions:

Operation	Implementation
Load Relaxed	`LD`
Load Acquire	`LD` + `FENCE R,RW`
Store Relaxed	`SD`
Store Release	`FENCE RW,W` + `SD`
RMW	`AMO` instructions with `.aq`/`.rl` modifiers

128-bit operations use low-word LR/SC emulation and may exhibit torn reads.

LoongArch 64-bit

LoongArch uses DBAR (data barrier) instructions:

Operation	Implementation
Load Relaxed	`LD.D`
Load Acquire	`LD.D` + `DBAR`
Store Relaxed	`ST.D`
Store Release	`DBAR` + `ST.D`
RMW	`AM*_DB` instructions

128-bit operations use low-word LL/SC emulation and may exhibit torn reads.

Fallback

Unsupported architectures use sync/atomic, which provides sequential consistency. 128-bit operations on fallback architectures are not atomic (two separate 64-bit operations).

Design Rationale

Explicit Memory Ordering

Instruction selection on weak architectures: ARM64/RISC-V select different instructions based on ordering requirements
Documentation: Ordering suffix documents synchronization intent
Portability: Code explicitly specifies requirements rather than relying on architecture-specific guarantees
Correctness: Makes memory ordering decisions explicit and reviewable

Comparison with sync/atomic

sync/atomic provides one sequentially consistent ordering for all operations:

Single ordering contract across operations
Portable behavior across Go architectures
No per-operation ordering selection

atomix targets:

Lock-free data structures
Kernel or hardware interface interoperation (io_uring, shared memory)
C/C++ ports that already carry explicit memory ordering
ARM64/RISC-V paths where explicit ordering controls instruction selection

Platform Support

Platform	Implementation
linux/amd64	Native assembly
linux/arm64	Native assembly with LSE; ARMv8.4+ package baseline
linux/riscv64	Native assembly (128-bit emulated)
linux/loong64	Native assembly (128-bit emulated)
darwin/amd64, darwin/arm64	Native assembly
freebsd/amd64, freebsd/arm64	Native assembly
Other	sync/atomic fallback

Compiler Intrinsics

atomix provides a customized Go compiler that emits inline atomic instructions instead of function calls. This transforms function calls into single CPU instructions, eliminating call overhead.

Compiler Workflow

# Install the intrinsics-customized compiler
make install-compiler

# Build with intrinsics
make build

# Test with intrinsics (120s timeout)
make test

# Run benchmarks with a longer timeout
make bench

# Verify intrinsics are applied
make verify

What the Compiler Does

The customized compiler adds SSA operations for atomix intrinsics:

Operation	x86-64	ARM64
Load (Relaxed)	`MOV`	`LDR`
Load (Acquire)	`MOV`	`LDAR`
Store (Relaxed)	`MOV`	`STR`
Store (Release)	`MOV`	`STLR`
Add (AcqRel)	`LOCK XADD`	`LDADDAL`
CAS	`LOCK CMPXCHG`	`CASAL`

x86-64 TSO optimization: Release stores use plain MOV instead of XCHG, leveraging x86-64's Total Store Ordering which provides implicit release semantics for all stores.

Manual Compiler Setup

If you prefer manual setup over the Makefile:

# Clone the intrinsics compiler
git clone --branch atomix https://github.com/hayabusa-cloud/go.git ~/github.com/go

# Build the compiler
cd ~/github.com/go/src && ./make.bash

# Use for atomix
GOROOT=~/github.com/go ~/github.com/go/bin/go build ./...

See intrinsics.md for detailed implementation documentation.

License

MIT — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.github		.github
internal/arch		internal/arch
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.es.md		README.es.md
README.fr.md		README.fr.md
README.ja.md		README.ja.md
README.md		README.md
README.zh-CN.md		README.zh-CN.md
align.go		align.go
alloc.go		alloc.go
atomix_test.go		atomix_test.go
barrier.go		barrier.go
bool.go		bool.go
cache.go		cache.go
cache_amd64.go		cache_amd64.go
cache_arm64.go		cache_arm64.go
cache_loong64.go		cache_loong64.go
cache_other.go		cache_other.go
cache_riscv64.go		cache_riscv64.go
compare_test.go		compare_test.go
comprehensive_bench_test.go		comprehensive_bench_test.go
contention_bench_test.go		contention_bench_test.go
coverage_test.go		coverage_test.go
doc.go		doc.go
go.mod		go.mod
int128.go		int128.go
int32.go		int32.go
int64.go		int64.go
intrinsics.md		intrinsics.md
nocopy.go		nocopy.go
order.go		order.go
order_bool.go		order_bool.go
order_int128.go		order_int128.go
order_int32.go		order_int32.go
order_int64.go		order_int64.go
order_pointer.go		order_pointer.go
order_test.go		order_test.go
order_uint128.go		order_uint128.go
order_uint32.go		order_uint32.go
order_uint64.go		order_uint64.go
order_uintptr.go		order_uintptr.go
pointer.go		pointer.go
stress_test.go		stress_test.go
types.go		types.go
uint128.go		uint128.go
uint32.go		uint32.go
uint64.go		uint64.go
uintptr.go		uintptr.go

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

atomix

Overview

Installation

Memory Ordering

Ordering Selection

Ordering Selection Matrix

Types

Value Types

Padded Types

Operations

CompareAndSwap vs CompareExchange

Pointer-Based API

128-bit Operations

Architecture Implementation

x86-64 (TSO)

ARM64 (Weakly Ordered)

128-bit Operations

RISC-V 64-bit

LoongArch 64-bit

Fallback

Design Rationale

Explicit Memory Ordering

Comparison with sync/atomic

Platform Support

Compiler Intrinsics

Compiler Workflow

What the Compiler Does

Manual Compiler Setup

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 5

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages