Skip to content

Conversation

axelboc
Copy link
Contributor

@axelboc axelboc commented Mar 14, 2025

This partially addresses #1758 (I'll open an issue for the "video playback" part, as this is out of scope for the time being). It was first suggested in #1554.

The new Array visualisation acts as a fallback, like Raw, but specifically for array datasets. It appears for any array dataset that is not supported by any of the more powerful visualization, like Matrix or Compound. It basically shows the dimension mapper with only sliders (one for every dimension), retrieves a specific scalar value from the dataset, and passes it to RawVis.

The vision I have, which will come together in my next PR, is to then merge the Raw and Scalar visualizations to end up with two "low-level" fallback visualizations: one for scalars and one for arrays. I think it will make a lot more sense.


The main challenge was dealing with "scalar selections" in the providers. I'm talking about selection strings of the form [0-9](,[0-9])* (e.g. 2, 0,1, 2,0,4, etc.)

h5grove seems to handle these scalar selections as expected — i.e. it returns a scalar value — but h5wasm and HSDS have some quirks... I wrote some comments in the code that are hopefully clear enough.


To test the new Array vis, the easiest is to look at the following datasets in sample.h5: /byte_string_1D, /byte_string_2D, /compound_array_vlen_1D, /compound_mixed_2D, /vlen_float32_1D, /vlen_int64_1D, /vlen_utf8_1D. They all used to trigger the Raw visualization, but now they trigger the Array visualization instead:

BEFORE AFTER
image image
image image
image image

And to test with a stack of JPEG images, I've created a file called frogs.h5:

BEFORE AFTER
image image

ZIP with h5py script, images and frogs.h5: frogs.zip

): unknown {
const { shape } = dataset;
const dataArray = ndarray(value as ScalarValue<typeof dataset.type>[], shape);
const mappedArray = applyMapping(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

applyMapping brings some unnecessary misdirection. Instead, I now call pick and createArrayFromView directly.

And since pick always returns an array even when selecting a scalar, I add a condition to extract the scalar.

hasCompoundType(dataset) &&
hasPrintableCompoundType(dataset) &&
hasNonNullShape(dataset) &&
(hasScalarShape(dataset) || hasMinDims(dataset, 1))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last line actually did the same check as hasNonNullShape...

Comment on lines +325 to +334
const slicedValue = h5wDataset.slice(ranges);
assertNonNull(slicedValue);

/* h5wasm unwraps scalar slices inconsistently - e.g. it does so for opaque
* datasets but not for compound datasets */
if (isScalarSelection(selection) && Array.isArray(slicedValue)) {
return slicedValue[0];
}

return slicedValue;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll open an issue on the h5wasm repo when I get the time.

).not.toThrow();
});

describe('assertDatasetValue', () => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate describe block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cf. create_h5_sample.py at the bottom of the diff.

@axelboc axelboc requested a review from loichuder March 14, 2025 14:58
@axelboc
Copy link
Contributor Author

axelboc commented Mar 14, 2025

/approve

@axelboc axelboc changed the title Add fallback Array vis for array datasets Add fallback Array vis to support stacks of JPEG/PNG images Mar 14, 2025
@axelboc axelboc marked this pull request as draft March 14, 2025 16:01
@axelboc axelboc removed the request for review from loichuder March 14, 2025 16:01
@axelboc
Copy link
Contributor Author

axelboc commented Mar 14, 2025

h5grove seems to handle these scalar selections as expected — i.e. it returns a scalar value — but h5wasm and HSDS have some quirks... I wrote some comments in the code that are hopefully clear enough.

I've just realised that this assumption that providers should return scalar values for scalar selections in array datasets is problematic in terms of typing. In assertDatasetValue, the return type is Value<Dataset>, which only cares about the dataset's shape: if it's ArrayShape, we get ArrayValue, if it's ScalarShape, we get ScalarValue.

With the changes in this PR, the return type Value<Dataset<ArrayShape, T>> should actually resolve to ScalarValue<T> | ArrayValue<T> instead of just ArrayValue<T>. So assertDatasetValue is lying.

I have to investigate the impact of changing the Value type.

@loichuder
Copy link
Member

loichuder commented Mar 17, 2025

Looking at your screenshots, I wonder if the move should not be to extend the Matrix vis to support these unsupported arrays.

And to create a dedicated vis for JPEG/PNG ?

For example, what if I want to inspect the binary blob for some reason ? I would expect to select the Raw vis in this case.

@axelboc
Copy link
Contributor Author

axelboc commented Mar 17, 2025

Looking at your screenshots, I wonder if the move should not be to extend the Matrix vis to support these unsupported arrays.

The limitation of the Matrix vis is that the space for each item is restricted: single row, fixed width. So vlens, arrays, complicated compounds, etc. wouldn't fit. We could imagine nesting the Matrix vis: if an item is a vlen/array/compound, allow the user to click on the cell to explore the value, but this would take more clicks than with the dimension mapper.

The other problem is that we can't tell the dataset contains JPEG/PNG image(s) until we actually fetch its value. If we decide to trigger the Matrix vis by default for an opaque 1D dataset, for instance, we end up with a bunch of problems (no slider in the dimension mapper, Matrix tab that doesn't fit the visualization, etc.)

For example, what if I want to inspect the binary blob for some reason ? I would expect to select the Raw vis in this case.

Yeah, it's true that in the new Array vis, users can only export one slice at a time to JSON. There's no way to see the entire dataset. If that's really a problem for someone, though, perhaps we could make the sliders togglable somehow?

@loichuder
Copy link
Member

The limitation of the Matrix vis is that the space for each item is restricted: single row, fixed width. So vlens, arrays, complicated compounds, etc. wouldn't fit. We could imagine nesting the Matrix vis: if an item is a vlen/array/compound, allow the user to click on the cell to explore the value, but this would take more clicks than with the dimension mapper.

Gotcha. Yeah, in essence, you wanted a Raw vis but with a dimension mapper for things that we know are arrays but don't have a suitable visualization .

The other problem is that we can't tell the dataset contains JPEG/PNG image(s) until we actually fetch its value.

I didn't remember that, that's unfortunate 🙁

}

export function isScalarSelection(selection: string | undefined): boolean {
return selection !== undefined && /^\d+(?:,\d+)*$/u.test(selection);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I struggle to understand what you mean by "scalar selection".

Is it a selection that will produce a scalar value ? If so, the test should check also the dataset shape ?

Is it a selection that only has scalar values inside ? In that case, perhaps "selected indices" would be a better name ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a selection that produces a scalar value, like 2,0,1 or 3, as opposed to 2,0,: or :.

The dataset shape doesn't matter here: when we have a selection string, the dataset is expected to have an array shape (cf. assertDatasetValue). The selection string is also expected to have the same number of elements as the number of dimensions in the dataset.


The problem I'm facing (which is why I've switched the PR to draft) is that the Value type has to be aware of the selection so it can conditionally return ScalarValue for "scalar selections".

I've managed to make it work with:

type ValueSelection = (number | null)[];
type ScalarSelection = number[];

export type Value<
  D extends Dataset,
  V extends ValueSelection | undefined = undefined,
> =
  D extends Dataset<infer S, infer T>
    ? S extends ScalarShape
      ? ScalarValue<T>
      : S extends ArrayShape
        ? V extends ScalarSelection
          ? ScalarValue<T>
          : ArrayValue<T>
        : never
    : never;

It requires keeping the selection as an array for longer, which changes quite a few things.

I'm now trying to use the dimension mapping array directly instead of introducing a new structure. I'll keep you updated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a selection that produces a scalar value, like 2,0,1 or 3, as opposed to 2,0,: or :.

I see. The issue I had is that I am used to Python where you can do stuff like threeD[3] and get a 2D array. In essence, the 3 is a short hand for 3, :, :.

But since we enforce the same number of elements as the number of dimensions in the dataset, we should never have this.

Copy link
Contributor Author

@axelboc axelboc Mar 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, indeed. I think it will be clearer to keep using the dimension mapping structure throughout to be honest (and/or maybe rethink that structure somehow).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants