Skip to content

Proposal: Introduce ExternalImage Data Type #603

@h-mayorquin

Description

@h-mayorquin

This proposal suggests adding a new ExternalImage data type to the NWB schema. The goal is to store references to images in their original file formats (e.g., PNG) instead of encoding them as dense arrays within NWB.

To-do (from the discussion with the Tab)

  • Add examples of API usage with this change.

Short Summary of Benefits

The following applies for conversions that use a large number of images:

  • Simplification of writing NWB files: Converting images to dense arrays requires large amounts of memory, computational resources, and/or complex code to write buffered data (e.g. HDMF Iterators). Writing an external image requires no special computational resources.
  • Appropriate compression enabled by default: Standard image formats like PNG and JPEG already include compression, avoiding a common pitfall when writing large NWB files.
  • Metadata preservation: Standard image formats already store essential metadata that is not included in the current Image neurodata type, improving the provenance of the data for future analysis.
  • Simplification of data types: The ExternalImage data type would allow support for image modes not currently handled by the schema (e.g., grayscale with alpha, HSV, LAB, etc.) without adding new neurodata types.
  • Avoiding data duplication: For multiple-session experiments that use the same set of images, duplication is avoided as all experiments can reference the same battery of images.

Implementation

Draft PR

The proposal includes the addition of two neurodata types to the schema: BaseImage and ExternalImage. The former will be the parent of the current Image and also the parent of the ExternalImage. This design allows future users to extend these two base data types in very different directions.

Motivation

I am working with a lab where a large battery of PNG images (of various modes like RGBA, RGB, and LA) is used as stimuli to probe neural activity. These images are then referenced with an IndexSeries as they are presented multiple times during the experiment. In one experiment, ~8,400 PNG images are used as stimuli, with similar numbers used in other experiments.

The current approach in NWB is to convert these images to dense arrays and store them as Image. However, converting them to dense arrays in NWB requires ~250 GiB of RAM and yields similarly large NWB files when written without compression.

For this and similar experimental paradigms, converting the data to NWB poses challenges. Users would need to implement iterators to avoid running out of memory, as well as develop code to map their images to appropriate neurodata types (RGB vs RGBA). Moreover, writing uncompressed data results in excessively large datasets. While PNG uses DEFLATE compression and this is available in both HDF5 and Zarr backends, my experiments show compressed representations in NWB files to be between 5% to 20% larger than the total size of the original image files. Additionally, if the same stimuli are used in multiple experiments, the images are duplicated in each NWB file.

How Does ExternalImage Help?

  • Writing an external image requires minimal computational resources (writing a string with a path and possibly some metadata).
  • Native image formats already contain compression (e.g., DEFLATE in PNG).
  • For multiple-session experiments using the same set of images, duplication is avoided as all experiments can reference the same image set.

Other Benefits of ExternalImage

Metadata Preservation

Standard image formats like PNG and JPEG already store essential metadata not included in the current Image neurodata type:

  • Bit depth
  • Color space
  • Gamma correction

While improving the Image neurodata type with these attributes is possible, it would duplicate work in an already well-established field.

Simplification of Data Types

Currently, NWB defines separate neurodata types for RGBImage, GrayImage, and RGBAImage. However, the Python Imaging Library (PIL) supports the following modes:

1 (1-bit pixels, black and white, stored with one pixel per byte)
L (8-bit pixels, grayscale)
P (8-bit pixels, mapped to any other mode using a color palette)
RGB (3x8-bit pixels, true color)
RGBA (4x8-bit pixels, true color with transparency mask)
CMYK (4x8-bit pixels, color separation)
YCbCr (3x8-bit pixels, color video format)
Note that this refers to the JPEG, and not the ITU-R BT.2020, standard
LAB (3x8-bit pixels, the L*a*b color space)
HSV (3x8-bit pixels, Hue, Saturation, Value color space)
Hue’s range of 0-255 is a scaled version of 0 degrees <= Hue < 360 degrees
I (32-bit signed integer pixels)
F (32-bit floating point pixels)
LA (L with alpha)
PA (P with alpha)
RGBX (true color with padding)
RGBa (true color with premultiplied alpha)
La (L with premultiplied alpha)
I;16 (16-bit unsigned integer pixels)
I;16L (16-bit little endian unsigned integer pixels)
I;16B (16-bit big endian unsigned integer pixels)
I;16N (16-bit native endian unsigned integer pixels)

(Source: Pillow Documentation)

ExternalImage would support all of these modes, avoiding the proliferation of neurodata types and reducing the schema’s maintenance burden.

Related and Prior Art

External References in ImageSeries

The ImageSeries neurodata type already includes an external_file field that references an external file, commonly used for video files. The considerations for this proposal are similar to those that led to the external_file field in ImageSeries: video data is computationally intensive to transform, compression is already available in the video file format, and the same video file can be reused across experiments.

See:
#462

Adding Images as Byte Strings

Images could be added as byte strings to the NWB file, achieving some of the advantages described here while keeping all data in a single file (which might be convenient in certain scenarios). However, this approach has significantly higher costs in terms of API design, maintenance, and security.

See Issue #574.

Metadata

Metadata

Assignees

No one assigned

    Labels

    category: proposalproposed enhancements or new featurespriority: mediumnon-critical problem and/or affecting only a small set of NWB users

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions