DataLoader issues in Google Colab: RuntimeError

Bobby_Joe_Hill · March 6, 2021, 10:58pm

Hi,

I am having issues with using a custom dataset in Google Colab. What I am doing is I have images in Google Drive (NIFTI MRIs) which I am transforming to 4x240x240 sample and 240x240 label numpy arrays. Then I save them locally in Google Colab as .npy files. So, there are two folders with over 18000 files each. Then I use the following code for my dataset:

class my_dataset(Dataset):
  def __init__(self, img_path, seg_path, 
               transform=None,  target_transform=None):
    ...
    self.file_num = len(os.listdir(img_path))
  
  def __getitem__(self, idx):
    img = np.load(os.path.join(self.img_path,
                               "sample"+str(idx+1)+".npy"))
    seg = np.load(os.path.join(self.seg_path,
                               "segmentation"+str(idx+1)+".npy"))
    if self.transform:
      out_img = self.transform(img)
    if self.target_transform:
      try: 
        out_seg = self.target_transform(seg)
      except TypeError: 
        out_seg = self.target_transform(seg.astype("int64")) 

    return out_img, out_seg

  def __len__(self):
    return self.file_num

If I just create an instance of the dataset class and return one item

data = BraTS2020_dataset("images",
                         "segmentations",
                         transform=ToTensor(),
                         target_transform=ToTensor())
sample, label = data.__getitem__(0)

it works perfectly fine but if I use the defaut Dataloader class

trainloader = DataLoader(data, batch_size=4, shuffle=True, num_workers=2)
example = next(iter(trainloader))

it sometimes works but other times I get a RuntimeError in one worker process:

RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/worker.py", line 198, in _worker_loop
    data = fetcher.fetch(index)
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch
    return self.collate_fn(data)
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/collate.py", line 83, in default_collate
    return [default_collate(samples) for samples in transposed]
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/collate.py", line 83, in <listcomp>
    return [default_collate(samples) for samples in transposed]
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/collate.py", line 55, in default_collate
    return torch.stack(batch, 0, out=out)
RuntimeError: result type Float can't be cast to the desired output type Short

I do not understand where the cast is done. Does this maybe have something to do with how Google Colab handles multiprocessing?

Bobby_Joe_Hill · March 12, 2021, 1:57pm

I think I found the problem: If I cast the numpy arrays img and seg as int64 directly there is no error occuring anymore.

NIFTI images can potentially have float values. torch.stack could not cast the inputs to the right datatype, giving an error.

With this code I can reproduce the error:

from torch import Tensor, stack, short, float, ones
import numpy as np

a = ones([2,2],dtype=short)
b = ones([2,2],dtype=float)
storage = a.storage()._new_shared(8)
out = a.new(storage)
stack([b,a], out=out)

---------------------------------------------------------------------------

RuntimeError                              Traceback (most recent call last)

<ipython-input-111-eb9ed3493ddd> in <module>()
      6 storage = a.storage()._new_shared(8)
      7 out = a.new(storage)
----> 8 stack([b,a], out=out)

RuntimeError: result type Float can't be cast to the desired output type Short

while the following code gets no error:

from torch import Tensor, stack, short, float, ones
import numpy as np

a = ones([2,2],dtype=short)
b = ones([2,2],dtype=float)
stack([b,a])

The problem is the inplace stacking and predefinition to save memory which collate of the dataloader does.
So, in conclusion, if you are having similar issues check your datatypes.