diff --git a/docs/source/format/CanonicalExtensions.rst b/docs/source/format/CanonicalExtensions.rst index 8608a6388e0..d63d19d181d 100644 --- a/docs/source/format/CanonicalExtensions.rst +++ b/docs/source/format/CanonicalExtensions.rst @@ -544,6 +544,39 @@ Primitive Type Mappings | UUID extension type | UUID | +----------------------+------------------------+ +.. _timestamp_with_offset_extension: + +Timestamp With Offset +============= +This type represents a timestamp column that stores potentially different timezone offsets per value. The timestamp is stored in UTC alongside the original timezone offset in minutes. +This extension type is intended to be compatible with ANSI SQL's ``TIMESTAMP WITH TIME ZONE``, which is supported by multiple database engines. + +* Extension name: ``arrow.timestamp_with_offset``. + +* The storage type of the extension is a ``Struct`` with 2 fields, in order: + + * ``timestamp``: a non-nullable ``Timestamp(time_unit, "UTC")``, where ``time_unit`` is any Arrow ``TimeUnit`` (s, ms, us or ns). + + * ``offset_minutes``: a non-nullable signed 16-bit integer (``Int16``) representing the offset in minutes from the UTC timezone. Negative offsets represent time zones west of UTC, while positive offsets represent east. Offsets range from -779 (-12:59) to +780 (+13:00). + +* Extension type parameters: + + * ``time_unit``: the time-unit of each of the stored UTC timestamps. + +* Description of the serialization: + + Extension metadata is an empty string. + +.. note:: + + It is also *permissible* for the ``offset_minutes`` field to be dictionary-encoded with a preferred (*but not required*) index type of ``int8``, or run-end-encoded with a preferred (*but not required*) runs type of ``int8``. + +.. note:: + + Although not required, it is *recommended* that implementations represent this type as an RFC3339 string when de/serializing to/from JSON, respecting the ``TimeUnit`` precision and time zone offset without loss of information. For example ``2025-01-01T00:00:00Z`` represents January 1st 2025 in UTC with second precision, and ``2025-01-01T00:00:00.000000001-07:00`` represents one nanosecond after January 1st 2025 in UTC-07. + + The rationale behind this recommendation is that many programming languages provide support for parsing RFC3339 out of the box, facilitating consumption of timezone-aware JSON-encoded Arrow arrays without extra boilerplate just for integrating with Arrow. + Community Extension Types =========================