DVP image readers #1

lucas-diedrich · 2024-11-27T09:42:22Z

This PR implements readers for DVP imaging data. Imaging data represents 1 of 3 modalities (imaging, proteomics, cell segmentation/shape information) relevant to DVP experiments.

Currently implemented readers

spatialdata_io.experimental.czi.read_czi - Reader for Carl-Zeiss formatted imaging files. It's a wrapper for the pylibczirw library, the official CarlZeiss python IO package for imaging data. The reader supports a) reading in a single RGB channel (as typically used for pathology/HE stains), or b) an arbitrary number of grayscale-channels
spatialdata_io.experimental.openslide.read_openslide. Reader for various whole slide imaging data, with a focus on digital pathology dataformats. This is a wrapper for the openslide library, a widely used and supported library to parse pathology data. This reader is mainly implemented for its support of the mirax format. The reader supports parsing of RGB/RGBA images and returns an RGBA image.

General strategy

Both readers follow the same strategy - to mitigate issues with reading the very large imaging data into memory, they tile the image, read in these subsets in a parallelized manner, leveraging dask.delayed and return an assembled xarray, parsed by the spatialdata.models.Image2DModel.

Implementation

This strategy is implemented with general functions, implemented in the _utils.py module and format-specific funtions, implemented in the respective modules. The implementation follows the functional programming scheme of the other spatialdata_io readers.

Add reader for Stereo-seq files.

update codecov

Made Xenium table optional

Support for `binned_ouputs` folder for Visium HD

merscope reader update

updates: - [github.com/asottile/pyupgrade: v3.15.2 → v3.16.0](asottile/pyupgrade@v3.15.2...v3.16.0)

[pre-commit.ci] pre-commit autoupdate

Reader for seqFISH data

fix bin_size parsing visium hd

for more information, see https://pre-commit.ci

into visium_hd

[pre-commit.ci] pre-commit autoupdate

Visium hd

for more information, see https://pre-commit.ci

…/spatialdata-io into improved-release-process

improved release process

LucaMarconato · 2025-01-05T14:36:13Z

Hi @lucas-diedrich, I have noticed this PR from you. By any chance did you intend to open it against scverse/spatialdata-io/main? Currently it is opened against lucas-diedrich/spatialdata-io/main.

merscope reader remove invalid polygons

lucas-diedrich · 2025-01-07T14:29:20Z

Hi @LucaMarconato, thanks for your message! I intended to open this PR against my fork to receive initial internal feedback, apologies for the confusion! I am still waiting for some feedback on the readers so it might still take some time until I open a real pull request.

We (the Mann Lab) are very interested in using spatialdata for the analysis of our DVP workflow. So far, I have implemented parsers for

the various and diverse imaging data (Carl Zeiss imaging data, Pathology Slides e.g. mirax, generic (tiff))
and cell segmentation output.

Parsers for various MS-based proteomics data formats will probably be available in the next few weeks.

Comparing our requirements with the implemented readers for spatial omics technologies, I noted that our workflow might deviate slightly from the other technologies: Usually, we first analyze the imaging data to identify cells of interest and only select a small subset for subsequent Mass Spectrometry-based profiling. In other words, we would usually have a sequential workflow in which we

Load imaging data into the spatialdata format (microscopy files -> dask/xarray -> .images attribute)
Compute cell segmentations with existing analysis pipelines (.xml files -> geopandas -> .shapes attribute)

Then perform the initial computational analysis to identify cells or areas of interest, and subsequently
3. Create proteomics data for this small selection (various quantification engines/.tsv files -> anndata -> .tables attribute)

This is in contrast to the existing readers that appear to be intended to load all data at once into the object. Overall, it is certainly true that the workflow and the generated output files, at this point, are far less streamlined than other spatial technologies and some outputs might change in the future.

Therefore, my question would be what you would consider the best way to proceed:

Would you be interested in me creating a pull request for the imaging readers alone (e.g. to spatialdata_io.experimental)?
Would you like to wait until we have implemented a single wrapper for all required functionalities (images, cell segmentation, omics)?
How would you deal with the different demands of our workflow?

Many thanks!
Best, Lucas

…overflow. Construct dask arrays directly after reading the respective numpy array

… dvp_image_readers

LucaMarconato · 2025-02-08T15:59:36Z

Hi, thanks for the detailed explanation.

Would you be interested in me creating a pull request for the imaging readers alone (e.g. to spatialdata_io.experimental)?

We have in the roadmap to have a robust way to parse various image formats. This was partially (only) started here scverse#234 and here https://github.com/scverse/spatialdata-io/blob/main/src/spatialdata_io/readers/generic.py, so the read_openslide() function in this PR could generalize the work above by delegating to openslide. Really exciting!

One would need to double check the support for .ome.tiff, which is popular across spatial omics vendors. An example where we parse the OME xml metadata is here https://github.com/scverse/spatialdata-io/blob/2433e73172fe2e443ec5e895f5963504d459a6cc/src/spatialdata_io/readers/seqfish.py#L243. In conclusion, our code for parsing images was built in several steps and read_openslide() seems to be a great way to bring things together.

Would you like to wait until we have implemented a single wrapper for all required functionalities (images, cell segmentation, omics)?

Smaller PRs are easier to build a review, so I would split the work whenever possible.

How would you deal with the different demands of our workflow?

We support building, writing and reading SpatialData objects incrementally/modularly, so I don't think there would be an obstacle for the workflow you described (incremental reading is currently possible only via private APIs--we are working on making them public).

Update xenium.py, fix problem with hidden files in morphology_foucs direcory

LucaMarconato and others added 30 commits May 21, 2024 21:33

using points for bins

adfd735

experiment with relabeling; I will remove the code

87d6cab

cleanup

f56639f

fix spatialdata dependency

dbab732

update

8bc283c

stereoseq fixes

b3b5a51

changelog

22c4e27

add stereoseq to readme

cf9c749

cleanup

a0c1ae9

Merge pull request scverse#70 from LLehner/Stereo-seq_reader

3599dff

Add reader for Stereo-seq files.

Merge branch 'main' into giovp/codecov

1b33147

Merge pull request scverse#148 from scverse/giovp/codecov

f1e0ea8

update codecov

fix dataframe import

f88c177

support for binned_outputs folder for visium hd

ba32593

xenium table optional

7b0f234

merscope reader update

fe4813e

fix mypy and flake8

954c5b6

code review from quentin

bce8af9

Merge pull request scverse#151 from scverse/xenium_no_table

fc67aa2

Made Xenium table optional

Merge pull request scverse#149 from scverse/fix/visium_hd_bins

81ca99e

Support for `binned_ouputs` folder for Visium HD

Merge pull request scverse#152 from quentinblampey/merscope_rioxarray

55b1071

merscope reader update

[pre-commit.ci] pre-commit autoupdate

4e42a45

updates: - [github.com/asottile/pyupgrade: v3.15.2 → v3.16.0](asottile/pyupgrade@v3.15.2...v3.16.0)

Merge pull request scverse#154 from scverse/pre-commit-ci-update-config

5d42716

[pre-commit.ci] pre-commit autoupdate

fix bin_size parsing visium hd

581f984

improvements seqfish reader

ed8c294

Merge branch 'main' into seqFISH_reader

1a35e7d

fix pre-commit

e4b8daa

Merge pull request scverse#53 from LLehner/seqFISH_reader

585f055

Reader for seqFISH data

Merge pull request scverse#158 from scverse/fix_visium_hd_bins

a252a1f

fix bin_size parsing visium hd

fix outdated function

f3b57e8

pre-commit-ci bot and others added 12 commits January 2, 2025 18:53

[pre-commit.ci] auto fixes from pre-commit.com hooks

179a042

for more information, see https://pre-commit.ci

fix typo

6a0e5a7

Merge branch 'visium_hd' of https://github.com/ArneDefauw/spatialdata-io

261dbbb

into visium_hd

Merge pull request scverse#190 from scverse/pre-commit-ci-update-config

147126d

[pre-commit.ci] pre-commit autoupdate

Merge pull request scverse#211 from ArneDefauw/visium_hd

681097d

Visium hd

update changelog

92dbde1

improved release process

a29ecb4

[pre-commit.ci] auto fixes from pre-commit.com hooks

2e15fff

for more information, see https://pre-commit.ci

added release.yml for automatic release note generation

75c70eb

Merge branch 'improved-release-process' of https://github.com/scverse…

517ffcc

…/spatialdata-io into improved-release-process

Merge pull request scverse#254 from scverse/improved-release-process

c0792b0

improved release process

Update README.md with: solutions to common problems

d4b9c8c

Merge pull request scverse#207 from ckmah/main

a5dfb92

merscope reader remove invalid polygons

Lucas Diedrich added 10 commits January 7, 2025 15:35

[Test] Removed unused arguments

d86e0c2

[Fix] Fix incorrect construction of dask arrays that leads to memory …

86819f3

…overflow. Construct dask arrays directly after reading the respective numpy array

[Fix] Remove unnecessary call

e439f40

[Fix] Adjust logic to account for fixes in delayed calls

a281ebf

[Fix] Adjust logic to account for fixes in delayed calls

de8de88

Merge branch 'main' of https://github.com/scverse/spatialdata-io into…

6deee23

… dvp_image_readers

[CI/CD] Add test data

886dacc

Removed unnecessary import

7dc113b

[Tests] Update tests for chunk factory

0f17fe2

[Test] Removed unnecessary arguments

6fa489f

Mr-Milk mentioned this pull request Jan 31, 2025

Efficient whole slide imaging IO scverse/spatialdata#856

Open

lucas-diedrich mentioned this pull request Feb 8, 2025

Add universal image loader scverse/spatialdata-io#234

Draft

lucas-diedrich pushed a commit that referenced this pull request Feb 17, 2025

Merge pull request #1 from psl-schaefer/psl-schaefer-patch-1

d0be188

Update xenium.py, fix problem with hidden files in morphology_foucs direcory

lucas-diedrich mentioned this pull request Feb 17, 2025

Chunkwise image loader scverse/spatialdata-io#279

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DVP image readers #1

DVP image readers #1

lucas-diedrich commented Nov 27, 2024

Uh oh!

LucaMarconato commented Jan 5, 2025

Uh oh!

lucas-diedrich commented Jan 7, 2025 •

edited

Loading

Uh oh!

LucaMarconato commented Feb 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

DVP image readers #1

Are you sure you want to change the base?

DVP image readers #1

Conversation

lucas-diedrich commented Nov 27, 2024

Currently implemented readers

General strategy

Implementation

Uh oh!

LucaMarconato commented Jan 5, 2025

Uh oh!

lucas-diedrich commented Jan 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LucaMarconato commented Feb 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

lucas-diedrich commented Jan 7, 2025 •

edited

Loading