IPFS "data provider" use case & best practices #3

clevinson · 2025-08-02T00:20:08Z

clevinson
Aug 2, 2025
Maintainer

This thread is meant to talk through the different ways that one might approach storage of cloud-optimized geospatial datasets (thinking large array data) from a IPFS "data provider" / hoster POV.

Things we should think about:

Which best practices, node configuration, hosting infrastructure make sense?
What kinds of trade-offs / goals should we be considering when exploring different approaches?

One thing I'm wondering is what configuration we want w/r/t usage of IPFS's datastore. In IPFS default's configuration, data is copied in full to the IPFS internal datastore (usually somewhere like ~/.ipfs/....). Two alternative options (both experimental features) are:

ipfs filestore: files are not copied to the datastore, but rather the existing files on disk are used for delivering content to other nodes the network
ipfs urlstore: files are not copied to the datastore, but retrieved from a URL over HTTP
- maybe this could work with existing cloud-optimized datasets on s3 if our goal is simply to expose pre-existing datasets over the IPFS network with CIDs?

I get the sense that the biggest place for impact is actually in the direction of what @rsignell was initially proposing (doing something similar to virtualizarr, so that non-ARCO datasets (e.g. netcdf) can be accessed via range requests when stored on IPFS. These IPFS "range requests" would crawl the IPLD dag to get a subset of IPFS blocks from a dataset, similar to HTTP range requests in existing cloud optimized geospatial data workflows).

If that ends up being our goal, we probably want to leverage either ipfs filestore or ipfs datastore (see above) so that these large datasets don't have to get copied over. If users are comfortable duplicating the data from a large netCDF file over into the IPFS datastore, then they might as well reformat as zarr (which from my first round of notebook experiments in #1, already plays pretty nicely with IPFS/IPLD).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IPFS "data provider" use case & best practices #3

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

IPFS "data provider" use case & best practices #3

Uh oh!

Uh oh!

clevinson Aug 2, 2025 Maintainer

Replies: 0 comments

clevinson
Aug 2, 2025
Maintainer