Hi,
I am having issues with using a custom dataset in Google Colab. What I am doing is I have images in Google Drive (NIFTI MRIs) which I am transforming to 4x240x240 sample and 240x240 label numpy arrays. Then I save them locally in Google Colab as .npy files. So, there are two folders with over 18000 files each. Then I use the following code for my dataset:
class my_dataset(Dataset):
def __init__(self, img_path, seg_path,
transform=None, target_transform=None):
...
self.file_num = len(os.listdir(img_path))
def __getitem__(self, idx):
img = np.load(os.path.join(self.img_path,
"sample"+str(idx+1)+".npy"))
seg = np.load(os.path.join(self.seg_path,
"segmentation"+str(idx+1)+".npy"))
if self.transform:
out_img = self.transform(img)
if self.target_transform:
try:
out_seg = self.target_transform(seg)
except TypeError:
out_seg = self.target_transform(seg.astype("int64"))
return out_img, out_seg
def __len__(self):
return self.file_num
If I just create an instance of the dataset class and return one item
data = BraTS2020_dataset("images",
"segmentations",
transform=ToTensor(),
target_transform=ToTensor())
sample, label = data.__getitem__(0)
it works perfectly fine but if I use the defaut Dataloader class
trainloader = DataLoader(data, batch_size=4, shuffle=True, num_workers=2)
example = next(iter(trainloader))
it sometimes works but other times I get a RuntimeError in one worker process:
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/worker.py", line 198, in _worker_loop
data = fetcher.fetch(index)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch
return self.collate_fn(data)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/collate.py", line 83, in default_collate
return [default_collate(samples) for samples in transposed]
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/collate.py", line 83, in <listcomp>
return [default_collate(samples) for samples in transposed]
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/collate.py", line 55, in default_collate
return torch.stack(batch, 0, out=out)
RuntimeError: result type Float can't be cast to the desired output type Short
I do not understand where the cast is done. Does this maybe have something to do with how Google Colab handles multiprocessing?