Skip to content

Proposal: Hexadecimal and HexID, or AlphaNumber and AlphaID #688

@mother10

Description

@mother10

The idea here is to have a way of shortening the length of ID pointer values.
Each record now has a pointer that looks like this:

@ followed by 1 or 2 letters (denoting the record type), followed by a number, and ended with an @.

In the new system, with STICKYs, in a file with 600.000 persons, that might mean there will be 3 * 600.000 (minimal ! ) STICKYs, so around 1.8 million, or more. Their pointers will have many number characters, and be difficult to read for humans.
(Not for software)

Thats why there is a new datatype Hexadecimal in the PR #679 and an accompanying HexID.

Defined as:

HexaDecimal

A HexaDecimal is a non-empty sequence of ASCII hexadecimal digits, representing an unsigned integer in base-16.

Characters MUST be chosen from 0–9 and uppercase letters A–F.
Letters must be uppercase in GEDCOM files to ensure consistency and compatibility across systems.

Leading zeros are permitted but have no semantic meaning.
Negative values are not supported.

This type may be used for scalable, compact identifiers where decimal values become inefficient or overly long.

Hexadecimal = 1*( %x30-39 / %x41-46 )  ; 0–9 or A–F (uppercase only)

:::example
Examples:

  • 1
  • 7F
  • 00A3
  • 00F4240 (decimal 1,000,000)
  • 2DC6C0 (decimal 3,000,000)
    :::

The URI for the HexaDecimal data type is:
xsd:hexBinary (loosely aligned; GEDCOM format is numeric-only, without spacing or byte-grouping).

HexID

A HexID is a cross-reference identifier used to uniquely identify a record within a GEDCOM file using a hexadecimal format.
This format can be used when the quantity of records (such as STICKYs) may exceed the limitations of base-10 padded numbering.

It is composed of:

  • A record-type prefix (e.g., ST, I, R)
  • A required separator digit 0 (zero)
  • A Hexadecimal number that can grow in length as needed

The entire identifier is enclosed in @ symbols, consistent with GEDCOM cross-reference ID syntax.

HexID = "@" Prefix "0" Hexadecimal "@"
Prefix = %x41-5A / 2(%x41-5A)  ; one or two uppercase letters A–Z

::: example
Examples:

  • @ST00001@

  • @ST003E8@

  • @ST0F4240@ (decimal 1,000,000)

  • @R02DC6C0@ (REPO example, decimal 3,000,000)
    :::

  • The Hexadecimal portion MUST conform to the rules in the Hexadecimal data type definition.

  • The record-type prefix MUST consist of exactly one or two uppercase ASCII letters (A–Z).
    This includes standard GEDCOM record types (S, I, F, etc.) and new custom types such as ST for STICKY or SP for SPLAC.

The URI for the HexID type is: g8:HexID
This format is GEDCOM-specific and has no standard XML Schema equivalent.

Alternative: AlphaNumber and AlphaID

AlphaNumber

The AlphaNumber defines a numbering system using only uppercase alphabetic characters (A–Z). It may be interpreted as a base-26 counter where A = 0, B = 1, …, Z = 25. Sequences extend in a positional manner (e.g., AA = 26, AB = 27), though software may also treat the sequence as an opaque string without arithmetic interpretation.

The AlphaID provides an alternative cross-reference identifier system using AlphaNumber instead of decimal or hexadecimal digits. It is suitable for use in record pointers, ensuring uniqueness while clearly separating the record abbreviation from its identifier. An AlphaID begins with the RecordAbbrev (1 or 2 uppercase letters), followed by one or more zeros as a required separator, and then an AlphaNumber. The entire identifier is enclosed in @ signs.

ABNF:

AlphaID        = "@" RecordAbbrev OneOrMoreZeros AlphaNumber "@"

RecordAbbrev   = 1*2ALPHA         /* Record abbreviation: 1 or 2 uppercase letters */
OneOrMoreZeros = 1*("0")          /* At least one zero, more allowed */
AlphaNumber    = 1*ALPHA          /* Base-26 string (AZ) */
ALPHA          = %x41-5A          /* AZ */

Note:
The AlphaNumber is conceptually a base-26 sequence. After Z, the sequence continues as AA, AB, AC, … Similar to how spreadsheet columns are named. This ensures that identifiers can scale indefinitely while remaining purely alphabetic.

:::example

@ST0ABC@      /* ST = record abbrev, 0 = required separator, ABC = AlphaNumber */
@ST00AAB@     /* two zeros for alignment */
@ST000XYZ@    /* three zeros, consistent fixed length */
@I0ZXY@       /* single-letter record abbrev */
@SN0BCD@      /* two-letter abbrev with minimal single zero */

:::

Note:
The AlphaNumber is conceptually a base-26 sequence. After Z, the sequence continues as AA, AB, AC, … Similar to how spreadsheet columns are named. This ensures that identifiers can scale indefinitely while remaining purely alphabetic.

AlphaID

The AlphaID provides an alternative cross-reference identifier system using a base-26 alphabetic numbering scheme rather than decimal or hexadecimal digits.
It is designed for use in record pointers, ensuring uniqueness while keeping the format distinct from record abbreviations.

  • Structure:
    An AlphaID begins with the record abbreviation (RecordAbbrev), which consists of exactly 1 or 2 uppercase letters.
    This is followed by a required "0" (more zero's are allowed), which ensures that the pointer body does not clash with the abbreviation itself.
    After the zero, one or more decimal digits (DIGIT) may follow (useful for padding, alignment, or user-chosen pointer lengths).
    Finally, one or more base-26 alphabetic characters (AlphaNumber) complete the identifier.
    The entire sequence is wrapped in @ signs.

  • Semantics:
    AlphaNumber is interpreted as a base-26 sequence using the letters A–Z (A = 0, B = 1, …, Z = 25).
    Software may treat the alphabetic portion as an incrementing counter, or simply as an opaque unique suffix.

  • ABNF:

AlphaID       = "@" RecordAbbrev "0" *DIGIT AlphaNumber "@"

RecordAbbrev  = 1*2ALPHA      ; Record abbreviation: 1 or 2 uppercase letters
AlphaNumber   = 1*ALPHA       ; Base-26 string (A–Z)
DIGIT         = %x30-39       ; 0–9
ALPHA         = %x41-5A       ; A–Z

Some examples:

:::example

@ST0ABC@      ; ST = record abbrev, 0 = required prefix, ABC = AlphaNumber
@ST00AAB@     ; padded with extra digit 0
@ST000XYZ@    ; multiple zeros for alignment
@I0ZXY@       ; single-letter record abbrev, base-26 suffix
@SN01BCD@     ; two-letter abbrev, decimal digit before AlphaNumber

:::

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions