-
Notifications
You must be signed in to change notification settings - Fork 27
Add fallback Array vis to support stacks of JPEG/PNG images #1768
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
): unknown { | ||
const { shape } = dataset; | ||
const dataArray = ndarray(value as ScalarValue<typeof dataset.type>[], shape); | ||
const mappedArray = applyMapping( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
applyMapping
brings some unnecessary misdirection. Instead, I now call pick
and createArrayFromView
directly.
And since pick
always returns an array even when selecting a scalar, I add a condition to extract the scalar.
hasCompoundType(dataset) && | ||
hasPrintableCompoundType(dataset) && | ||
hasNonNullShape(dataset) && | ||
(hasScalarShape(dataset) || hasMinDims(dataset, 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last line actually did the same check as hasNonNullShape
...
const slicedValue = h5wDataset.slice(ranges); | ||
assertNonNull(slicedValue); | ||
|
||
/* h5wasm unwraps scalar slices inconsistently - e.g. it does so for opaque | ||
* datasets but not for compound datasets */ | ||
if (isScalarSelection(selection) && Array.isArray(slicedValue)) { | ||
return slicedValue[0]; | ||
} | ||
|
||
return slicedValue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll open an issue on the h5wasm repo when I get the time.
).not.toThrow(); | ||
}); | ||
|
||
describe('assertDatasetValue', () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate describe
block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cf. create_h5_sample.py
at the bottom of the diff.
/approve |
I've just realised that this assumption that providers should return scalar values for scalar selections in array datasets is problematic in terms of typing. In With the changes in this PR, the return type I have to investigate the impact of changing the |
Looking at your screenshots, I wonder if the move should not be to extend the And to create a dedicated vis for JPEG/PNG ? For example, what if I want to inspect the binary blob for some reason ? I would expect to select the |
The limitation of the Matrix vis is that the space for each item is restricted: single row, fixed width. So vlens, arrays, complicated compounds, etc. wouldn't fit. We could imagine nesting the Matrix vis: if an item is a vlen/array/compound, allow the user to click on the cell to explore the value, but this would take more clicks than with the dimension mapper. The other problem is that we can't tell the dataset contains JPEG/PNG image(s) until we actually fetch its value. If we decide to trigger the Matrix vis by default for an opaque 1D dataset, for instance, we end up with a bunch of problems (no slider in the dimension mapper, Matrix tab that doesn't fit the visualization, etc.)
Yeah, it's true that in the new Array vis, users can only export one slice at a time to JSON. There's no way to see the entire dataset. If that's really a problem for someone, though, perhaps we could make the sliders togglable somehow? |
Gotcha. Yeah, in essence, you wanted a Raw vis but with a dimension mapper for things that we know are arrays but don't have a suitable visualization .
I didn't remember that, that's unfortunate 🙁 |
} | ||
|
||
export function isScalarSelection(selection: string | undefined): boolean { | ||
return selection !== undefined && /^\d+(?:,\d+)*$/u.test(selection); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I struggle to understand what you mean by "scalar selection".
Is it a selection that will produce a scalar value ? If so, the test should check also the dataset shape ?
Is it a selection that only has scalar values inside ? In that case, perhaps "selected indices" would be a better name ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a selection that produces a scalar value, like 2,0,1
or 3
, as opposed to 2,0,:
or :
.
The dataset shape doesn't matter here: when we have a selection string, the dataset is expected to have an array shape (cf. assertDatasetValue
). The selection string is also expected to have the same number of elements as the number of dimensions in the dataset.
The problem I'm facing (which is why I've switched the PR to draft) is that the Value
type has to be aware of the selection so it can conditionally return ScalarValue
for "scalar selections".
I've managed to make it work with:
type ValueSelection = (number | null)[];
type ScalarSelection = number[];
export type Value<
D extends Dataset,
V extends ValueSelection | undefined = undefined,
> =
D extends Dataset<infer S, infer T>
? S extends ScalarShape
? ScalarValue<T>
: S extends ArrayShape
? V extends ScalarSelection
? ScalarValue<T>
: ArrayValue<T>
: never
: never;
It requires keeping the selection as an array for longer, which changes quite a few things.
I'm now trying to use the dimension mapping array directly instead of introducing a new structure. I'll keep you updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a selection that produces a scalar value, like 2,0,1 or 3, as opposed to 2,0,: or :.
I see. The issue I had is that I am used to Python where you can do stuff like threeD[3]
and get a 2D array. In essence, the 3
is a short hand for 3, :, :
.
But since we enforce the same number of elements as the number of dimensions in the dataset, we should never have this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right, indeed. I think it will be clearer to keep using the dimension mapping structure throughout to be honest (and/or maybe rethink that structure somehow).
This partially addresses #1758 (I'll open an issue for the "video playback" part, as this is out of scope for the time being). It was first suggested in #1554.
The new Array visualisation acts as a fallback, like Raw, but specifically for array datasets. It appears for any array dataset that is not supported by any of the more powerful visualization, like Matrix or Compound. It basically shows the dimension mapper with only sliders (one for every dimension), retrieves a specific scalar value from the dataset, and passes it to
RawVis
.The vision I have, which will come together in my next PR, is to then merge the Raw and Scalar visualizations to end up with two "low-level" fallback visualizations: one for scalars and one for arrays. I think it will make a lot more sense.
The main challenge was dealing with "scalar selections" in the providers. I'm talking about
selection
strings of the form[0-9](,[0-9])*
(e.g.2
,0,1
,2,0,4
, etc.)h5grove seems to handle these scalar selections as expected — i.e. it returns a scalar value — but h5wasm and HSDS have some quirks... I wrote some comments in the code that are hopefully clear enough.
To test the new Array vis, the easiest is to look at the following datasets in
sample.h5
:/byte_string_1D
,/byte_string_2D
,/compound_array_vlen_1D
,/compound_mixed_2D
,/vlen_float32_1D
,/vlen_int64_1D
,/vlen_utf8_1D
. They all used to trigger the Raw visualization, but now they trigger the Array visualization instead:And to test with a stack of JPEG images, I've created a file called
frogs.h5
:ZIP with h5py script, images and
frogs.h5
: frogs.zip