Skip to content

Conversation

@lucas-diedrich
Copy link
Owner

This PR implements readers for DVP imaging data. Imaging data represents 1 of 3 modalities (imaging, proteomics, cell segmentation/shape information) relevant to DVP experiments.

Currently implemented readers

  • spatialdata_io.experimental.czi.read_czi - Reader for Carl-Zeiss formatted imaging files. It's a wrapper for the pylibczirw library, the official CarlZeiss python IO package for imaging data. The reader supports a) reading in a single RGB channel (as typically used for pathology/HE stains), or b) an arbitrary number of grayscale-channels
  • spatialdata_io.experimental.openslide.read_openslide. Reader for various whole slide imaging data, with a focus on digital pathology dataformats. This is a wrapper for the openslide library, a widely used and supported library to parse pathology data. This reader is mainly implemented for its support of the mirax format. The reader supports parsing of RGB/RGBA images and returns an RGBA image.

General strategy

Both readers follow the same strategy - to mitigate issues with reading the very large imaging data into memory, they tile the image, read in these subsets in a parallelized manner, leveraging dask.delayed and return an assembled xarray, parsed by the spatialdata.models.Image2DModel.

Implementation

This strategy is implemented with general functions, implemented in the _utils.py module and format-specific funtions, implemented in the respective modules. The implementation follows the functional programming scheme of the other spatialdata_io readers.

LucaMarconato and others added 30 commits May 21, 2024 21:33
Support for `binned_ouputs` folder for Visium HD
updates:
- [github.com/asottile/pyupgrade: v3.15.2 → v3.16.0](asottile/pyupgrade@v3.15.2...v3.16.0)
@LucaMarconato
Copy link

Hi @lucas-diedrich, I have noticed this PR from you. By any chance did you intend to open it against scverse/spatialdata-io/main? Currently it is opened against lucas-diedrich/spatialdata-io/main.

merscope reader remove invalid polygons
@lucas-diedrich
Copy link
Owner Author

lucas-diedrich commented Jan 7, 2025

Hi @LucaMarconato, thanks for your message! I intended to open this PR against my fork to receive initial internal feedback, apologies for the confusion! I am still waiting for some feedback on the readers so it might still take some time until I open a real pull request.

We (the Mann Lab) are very interested in using spatialdata for the analysis of our DVP workflow. So far, I have implemented parsers for

  • the various and diverse imaging data (Carl Zeiss imaging data, Pathology Slides e.g. mirax, generic (tiff))
  • and cell segmentation output.

Parsers for various MS-based proteomics data formats will probably be available in the next few weeks.

Comparing our requirements with the implemented readers for spatial omics technologies, I noted that our workflow might deviate slightly from the other technologies: Usually, we first analyze the imaging data to identify cells of interest and only select a small subset for subsequent Mass Spectrometry-based profiling. In other words, we would usually have a sequential workflow in which we

  1. Load imaging data into the spatialdata format (microscopy files -> dask/xarray -> .images attribute)
  2. Compute cell segmentations with existing analysis pipelines (.xml files -> geopandas -> .shapes attribute)

Then perform the initial computational analysis to identify cells or areas of interest, and subsequently
3. Create proteomics data for this small selection (various quantification engines/.tsv files -> anndata -> .tables attribute)

This is in contrast to the existing readers that appear to be intended to load all data at once into the object. Overall, it is certainly true that the workflow and the generated output files, at this point, are far less streamlined than other spatial technologies and some outputs might change in the future.

Therefore, my question would be what you would consider the best way to proceed:

  1. Would you be interested in me creating a pull request for the imaging readers alone (e.g. to spatialdata_io.experimental)?
  2. Would you like to wait until we have implemented a single wrapper for all required functionalities (images, cell segmentation, omics)?
  3. How would you deal with the different demands of our workflow?

Many thanks!
Best, Lucas

@LucaMarconato
Copy link

Hi, thanks for the detailed explanation.

Would you be interested in me creating a pull request for the imaging readers alone (e.g. to spatialdata_io.experimental)?

We have in the roadmap to have a robust way to parse various image formats. This was partially (only) started here scverse#234 and here https://github.com/scverse/spatialdata-io/blob/main/src/spatialdata_io/readers/generic.py, so the read_openslide() function in this PR could generalize the work above by delegating to openslide. Really exciting!

One would need to double check the support for .ome.tiff, which is popular across spatial omics vendors. An example where we parse the OME xml metadata is here https://github.com/scverse/spatialdata-io/blob/2433e73172fe2e443ec5e895f5963504d459a6cc/src/spatialdata_io/readers/seqfish.py#L243. In conclusion, our code for parsing images was built in several steps and read_openslide() seems to be a great way to bring things together.

Would you like to wait until we have implemented a single wrapper for all required functionalities (images, cell segmentation, omics)?

Smaller PRs are easier to build a review, so I would split the work whenever possible.

How would you deal with the different demands of our workflow?

We support building, writing and reading SpatialData objects incrementally/modularly, so I don't think there would be an obstacle for the workflow you described (incremental reading is currently possible only via private APIs--we are working on making them public).

lucas-diedrich pushed a commit that referenced this pull request Feb 17, 2025
Update xenium.py, fix problem with hidden files in morphology_foucs direcory
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.