I experienced similar problems.
Is something related to the fact that netCDF I/O is not thread-safe and there is some locking going on.
If you chunk your dataset with ds=ds.chunk(<your chunk options>)
, you save the dataset to disk in zarr format (ds.to_zarr()
) and reopen the dataset with xr.open_zarr()
then everything works fine
I additionally suggest to set dask.config.set(scheduler='synchronous')
to speed up the data loading if num_workers > 0
in your DataLoader.
2 Likes