Deadlock with DataLoader and xarray/dask

ghiggi · February 9, 2021, 4:45pm

I experienced similar problems.
Is something related to the fact that netCDF I/O is not thread-safe and there is some locking going on.
If you chunk your dataset with ds=ds.chunk(<your chunk options>), you save the dataset to disk in zarr format (ds.to_zarr()) and reopen the dataset with xr.open_zarr() then everything works fine
I additionally suggest to set dask.config.set(scheduler='synchronous') to speed up the data loading if num_workers > 0 in your DataLoader.