Skip to content
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 33 additions & 0 deletions docs/source/format/CanonicalExtensions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -544,6 +544,39 @@ Primitive Type Mappings
| UUID extension type | UUID |
+----------------------+------------------------+

.. _timestamp_with_offset_extension:

Timestamp With Offset
=============
This type represents a timestamp column that stores potentially different timezone offsets per value. The timestamp is stored in UTC alongside the original timezone offset in minutes.
This extension type is intended to be compatible with ANSI SQL's ``TIMESTAMP WITH TIME ZONE``, which is supported by multiple database engines.

* Extension name: ``arrow.timestamp_with_offset``.

* The storage type of the extension is a ``Struct`` with 2 fields, in order:

* ``timestamp``: a non-nullable ``Timestamp(time_unit, "UTC")``, where ``time_unit`` is any Arrow ``TimeUnit`` (s, ms, us or ns).

* ``offset_minutes``: a non-nullable signed 16-bit integer (``Int16``) representing the offset in minutes from the UTC timezone. Negative offsets represent time zones west of UTC, while positive offsets represent east. Offsets range from -779 (-12:59) to +780 (+13:00).

* Extension type parameters:

* ``time_unit``: the time-unit of each of the stored UTC timestamps.

* Description of the serialization:

Extension metadata is an empty string.

.. note::

It is also *permissible* for the ``offset_minutes`` field to be dictionary-encoded with a preferred (*but not required*) index type of ``int8``, or run-end-encoded with a preferred (*but not required*) runs type of ``int8``.

.. note::

Although not required, it is *recommended* that implementations represent this type as an RFC3339 string when de/serializing to/from JSON, respecting the ``TimeUnit`` precision and time zone offset without loss of information. For example ``2025-01-01T00:00:00Z`` represents January 1st 2025 in UTC with second precision, and ``2025-01-01T00:00:00.000000001-07:00`` represents one nanosecond after January 1st 2025 in UTC-07.

The rationale behind this recommendation is that many programming languages provide support for parsing RFC3339 out of the box, facilitating consumption of timezone-aware JSON-encoded Arrow arrays without extra boilerplate just for integrating with Arrow.

Community Extension Types
=========================

Expand Down