Skip to content

Conversation

tomsail
Copy link
Contributor

@tomsail tomsail commented Oct 8, 2025

For reference: #57

@tomsail
Copy link
Contributor Author

tomsail commented Oct 8, 2025

Dask tests fail probably because:

  • read_var_in_frame is calling concurrently the same file from multiple threads or processes.
  • and/or xr.map_blocks is running many tasks concurrently and those tasks are sharing the same self.slf_reader object inside SelafinLazyArray.
  • and/or the reader’s internal file handle / buffer is being corrupted by simultaneous reads, yielding garbled or truncated arrays.

I will try it later, but a lead to fix this would be to open a fresh reader per call.

like so:

class SelafinLazyArray(BackendArray):
    def __init__(self, slf_reader_or_path, var, dtype, shape):
        # Accept either a reader instance or a path/reader-factory
        self.slf_reader = slf_reader_or_path
        self.var = var
        self.dtype = dtype
        self.shape = shape

    def _open_reader(self):
        # If self.slf_reader is already a path or factory, open a fresh reader
        if isinstance(self.slf_reader, str):
            return SelafinReader(self.slf_reader)   # <-- adapt to your reader constructor
        # If provided a reader instance, try to get its path attribute
        if hasattr(self.slf_reader, "filepath"):
            return SelafinReader(self.slf_reader.filepath)
        # otherwise, fallback: assume 'slf_reader' is already a lightweight factory
        return self.slf_reader

    def _raw_indexing_method(self, key):
        ...
        for it, t in enumerate(time_indices):
            t_int = int(t)
            # open a fresh reader for this task
            reader = self._open_reader()
            try:
                temp = np.asarray(reader.read_var_in_frame(t_int, self.var))
            finally:
                # close if the reader supports close()
                if hasattr(reader, "close"):
                    try:
                        reader.close()
                    except Exception:
                        pass
            ...

@lucduron what do you think?
Is SelafinReader light enough to be concurrently opened on multiple time step and/or chunks?
My question is this object light enough, or does it contain heavy data that might be a problem for very big mesh handling? (I'm thinking about the IPOBO array for instance)

- Use a dask.utils.SerializableLock instance to avoid conflict while reading
- Refactor some variable in loops to be more precise
- Add a SerafinRequestError in cas of buffer truncation (in Serafin.read_var_in_frame)
@lucduron
Copy link
Collaborator

lucduron commented Oct 15, 2025

Hi @tomsail,

Sorry for the delay, I had a look to the problem only today.
I found some documentations online (see added doctring) and tried a fix in using an appropriate lock.
I do not really understand how xarray works from inside, in particulier with combination of dask and lazy loading.

The problem is not linked to Serafin class as I can use it with multiprocessing/threadings on a same file without problems. But with xarray, it seems that some locks are expected and I tried to follow the documentation.
It seems OK with dask in scheduler='threads' (default mode) but also with scheduler='processes' (also for this latter I am not sure it is robust).

The performance should now be checked to know if this implementation of dask if interesting (of course we currently read multiple times the same buffer in case of chunk on nodes of example).
I tried to let dask optional, could you check that everything is OK for you?

Hope it helps,
Luc

@lucduron
Copy link
Collaborator

Hi @tomsail,

Could you try to check if #58 is now fixed with my corrections?

Moreover I did 2 additionnal commits in this branch to :

  • improve code maintenance and redability
  • improve reading/writing performance in case of no specific selection (It was about 5x slower, it is now closer to pure PyTelTools implementation)
  • fix issue mitigate squeezing for [integer] selections #59 with isel with integer

I am aware that it is not linked with chunked arrays, so feel free to cherrypick my commits in another branch/PR.

Best Regards,
Luc

@lucduron
Copy link
Collaborator

Hi @tomsail, I added 5 minor commits today.
I think a new version/release should be defined (first 1.x.x ?) and some issues can be closed.
Let me know when you will have time to validate that everything is working properly.
Best Regards,
Luc

@lucduron
Copy link
Collaborator

Hi @tomsail,

I did a rapid benchmark on a large 2D Selafin file (3.62 Go) to quantify the gain in IO (reading/writing frames in series).
I confirm that this implementation of lazy loading is 5 times faster now (than the current main branch).

Here is the snippet used:

import os
from time import perf_counter, sleep
import xarray as xr

from xarray_selafin.Serafin import Read, Write
from xarray_selafin.xarray_backend import SelafinBackendEntrypoint


slf_in = "r2d_large.slf"  # 3.62 Go (2D file with 215362 nodes, 429285 elements, 409 frames, 11 variables)
slf_out = "temp_out.slf"
lang = "fr"


def remove_slf_out():
    if os.path.exists(slf_out):
        os.remove(slf_out)
    sleep(2.0)


def compare_slf():
    with open(slf_in, "rb") as in_slf1, open(slf_out, "rb") as in_slf2:
        assert in_slf1.read() == in_slf2.read()


remove_slf_out()
t1 = perf_counter()
with Read(slf_in, lang) as resin:
    with Write(slf_out, lang, overwrite=True) as resout:
        resin.read_header()
        resin.get_time()

        out_header = resin.header
        resout.write_header(out_header)
        for time_index, time in enumerate(resin.time):
            resout.write_entire_frame(out_header, time, resin.read_vars_in_frame(time_index=time_index))
t2 = perf_counter()
print("Pure PyTelTools: ", t2 - t1)
compare_slf()


remove_slf_out()
t3 = perf_counter()
ds = xr.open_dataset(
    slf_in,
    disable_lock=True,  # remove this arg for main
    lazy_loading=False,
    engine=SelafinBackendEntrypoint,
    lang=lang,
)
ds.selafin.write(slf_out)
t4 = perf_counter()
print("xarray without lazy_loading: ", t4 - t3)
compare_slf()


remove_slf_out()
t5 = perf_counter()
ds = xr.open_dataset(
    slf_in,
    disable_lock=True,  # remove this arg for main
    lazy_loading=True,
    engine=SelafinBackendEntrypoint,
    lang=lang,
)
ds.selafin.write(slf_out)
t6 = perf_counter()
print("xarray with lazy_loading: ", t6 - t5)
compare_slf()

Below are the results in details:

# Pure PyTelTools
## Run 1
Pure PyTelTools:  13.904615799999874

## Run 2
Pure PyTelTools:  15.935349300000098

## Run 3
Pure PyTelTools:  15.387261599999874


# chunked_array branch (99c629ec731a233bb625c0b0a5304bd72742aed7)
## Run 1
xarray without lazy_loading:  15.897774000000027
xarray with lazy_loading:  17.503599899999926

## Run 2
xarray without lazy_loading:  18.051799900000105
xarray with lazy_loading:  16.26682819999996

## Run 3
xarray without lazy_loading:  19.455610699999852
xarray with lazy_loading:  16.864858799999865


# main branch (b169d60b57bf6c0fe931240cafd7f8ec575e7f14)
## Run 1
xarray without lazy_loading:  25.59123349999993
xarray with lazy_loading:  91.39370969999993

## Run 2
xarray without lazy_loading:  26.594547399999897
xarray with lazy_loading:  95.7449911

## Run 3
xarray without lazy_loading:  18.76162649999992
xarray with lazy_loading:  110.1280354999999

For dask, it is now working but probably not fully optimized/adapted, but I will not have time to investigate in the short term.

I do not plan to commit other corrections or improvements in the coming days ;).

Luc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants