Skip to content

graphcore-research/gfloat

Repository files navigation

gfloat: Generic floating-point types in Python

An implementation of generic floating point encode/decode logic, handling various current and proposed floating point types:

The library favours readability and extensibility over speed (although the *_ndarray functions are reasonably fast for large arrays, see the benchmarking notebook). For other implementations of these datatypes more focused on speed see, for example, ml_dtypes, bitstring, MX PyTorch Emulation Library.

See https://gfloat.readthedocs.io for documentation, or dive into the notebooks to explore the formats.

For example, here's a table from the 02-value-stats notebook:

|name|B: Bits in the format|P: Precision in bits|E: Exponent field width in bits|Exact in float16?|Exact in float32?|0<x<1|1<x<Inf|minSubnormal|maxSubnormal|minNormal|maxNormal| | |--------------|-----|-----|-----|--------|--------|-------|-------|----------------|----------------|-------------|---------------| | p3109_k3p2sf | 3 | 2 | 1 | True | True | 1 | 1 | 0.5 | 0.5 | 1 | 1.5 | | ocp_e2m1 | 4 | 2 | 2 | True | True | 1 | 5 | 0.5 | 0.5 | 1 | 6 | | p3109_k4p2sf | 4 | 2 | 2 | True | True | 3 | 3 | 0.25 | 0.25 | 0.5 | 3 | | ocp_e2m3 | 6 | 4 | 2 | True | True | 7 | 23 | 0.125 | 0.875 | 1 | 7.5 | | ocp_e3m2 | 6 | 3 | 3 | True | True | 11 | 19 | 0.0625 | 0.1875 | 0.25 | 28 | | p3109_k6p3sf | 6 | 3 | 3 | True | True | 15 | 15 | 0.03125 | 0.09375 | 0.125 | 14 | | p3109_k6p4sf | 6 | 4 | 2 | True | True | 15 | 15 | 0.0625 | 0.4375 | 0.5 | 3.75 | | ocp_e4m3 | 8 | 4 | 4 | True | True | 55 | 70 | 2^-9 | 7/42^-7 | 0.015625 | 448 | | ocp_e5m2 | 8 | 3 | 5 | True | True | 59 | 63 | 2^-16 | 3/22^-15 | 2^-14 | 57344 | | p3109_k8p1se | 8 | 1 | 7 | False | True | 63 | 62 | n/a | n/a | 2^-63 | 2^62 | | p3109_k8p1ue | 8 | 1 | 8 | False | True | 127 | 125 | n/a | n/a | 2^-127 | 2^125 | | p3109_k8p3se | 8 | 3 | 5 | True | True | 63 | 62 | 2^-17 | 3/22^-16 | 2^-15 | 49152 | | p3109_k8p3sf | 8 | 3 | 5 | True | True | 63 | 63 | 2^-17 | 3/22^-16 | 2^-15 | 57344 | | p3109_k8p3ue | 8 | 3 | 6 | False | True | 127 | 125 | 2^-33 | 3/22^-32 | 2^-31 | 5/42^31 | | p3109_k8p3uf | 8 | 3 | 6 | False | True | 127 | 126 | 2^-33 | 3/22^-32 | 2^-31 | 3/22^31 | | p3109_k8p4se | 8 | 4 | 4 | True | True | 63 | 62 | 2^-10 | 7/42^-8 | 0.0078125 | 224 | | p3109_k8p4sf | 8 | 4 | 4 | True | True | 63 | 63 | 2^-10 | 7/42^-8 | 0.0078125 | 240 | | p3109_k8p4ue | 8 | 4 | 5 | True | True | 127 | 125 | 2^-18 | 7/42^-16 | 2^-15 | 53248 | | p3109_k8p4uf | 8 | 4 | 5 | True | True | 127 | 126 | 2^-18 | 7/42^-16 | 2^-15 | 57344 | | p3109_k8p7sf | 8 | 7 | 1 | True | True | 63 | 63 | 0.015625 | 63/322^-1 | 1 | 127/642^0 | | p3109_k8p8uf | 8 | 8 | 1 | True | True | 127 | 126 | 0.0078125 | 127/642^-1 | 1 | 127/642^0 | | binary16 | 16 | 11 | 5 | True | True | 15359 | 16383 | 2^-24 | 1023/5122^-15 | 2^-14 | 65504 | | bfloat16 | 16 | 8 | 8 | False | True | 16255 | 16383 | 2^-133 | 127/642^-127 | 2^-126 | 255/1282^127 | | ocp_e8m0 | 8 | 1 | 8 | False | True | 127 | 127 | n/a | n/a | 2^-127 | 2^127 | | ocp_int8 | 8 | 8 | 0 | True | True | 63 | 63 | 0.015625 | 127/642^0 | n/a | n/a |

Notes

All NaNs are the same, with no distinction between signalling or quiet, or between differently encoded NaNs.

About

Generic floating-point types in Python

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •