Skip to content

Add scores_precision parameter to FeatureAttributionOutput.saveΒ #202

@gsarti

Description

@gsarti

Description

This issue addresses the high space requirements of large attribution scores tensors by adding a scores_precision parameter to FeatureAttributionOutput.save method.

Proposant: @g8a9

Motivation

Currently, tensors in FeatureAttributionOutput objects (attributions and step scores) are serialized in float32 precision as a default when using out.save(). While it is possible to compress the representation of these values with ndarray_compact=True, the resulting JSON files are usually quite large. Using more parsimonious data types could reduce the size of saved objects and facilitate systematic analyses leveraging large amounts of data.

Proposal

float32 precision should probably remain the default behavior, as we do not want to cause any information loss by default.

float16 and float8 should also be considered, both in the signed and unsigned variants, since leveraging the strictly positive nature of some score types would allow supporting greater precision while halving space requirements. Unsigned values will be used as defaults if no negative scores are present in a tensor.

float16 can be easily used by casting tensors to the native torch.float16 data type, which would preserve precision up to 4 decimal values for scores normalized in the [-1;1] interval (8 for unsigned tensors). This corresponds to 2 or 4 decimal places for float8. However, this data type is not supported natively in Pytorch, so tensors should be converted to torch.int8 and torch.uint8 instead and transformed in floats upon reloading the object.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions