Hello,
I am using images in DICOM format for my project.
In order to train my model, I implemented a custom dataset, which you’ll find bellow.
The training succeeds when I am using images of size 128x128.
However, since I swichted for images of size 512x512, the training gets stuck after a few epochs, without any error message. When I abort the process manually by using ctrl+C, I get the following error:
^C
Aborted!
^CException ignored in: <bound method _MultiProcessingDataLoaderIter.__del__ of <torch.utils.data.dataloader._MultiProcessingDataLoaderIter object at 0x7f4c58bfdb00>>
Traceback (most recent call last):
File "/home/elsa.schalck/anaconda3/envs/env_kaggle_osic/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 962, in __del__
self._shutdown_workers()
File "/home/elsa.schalck/anaconda3/envs/env_kaggle_osic/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 942, in _shutdown_workers
w.join()
File "/home/elsa.schalck/anaconda3/envs/env_kaggle_osic/lib/python3.6/multiprocessing/process.py", line 124, in join
res = self._popen.wait(timeout)
File "/home/elsa.schalck/anaconda3/envs/env_kaggle_osic/lib/python3.6/multiprocessing/popen_fork.py", line 50, in wait
return self.poll(os.WNOHANG if timeout == 0.0 else 0)
File "/home/elsa.schalck/anaconda3/envs/env_kaggle_osic/lib/python3.6/multiprocessing/popen_fork.py", line 28, in poll
pid, sts = os.waitpid(self.pid, flag)
I tried to set the number of workers of the dataloader from 8 to 0. When the number of workers is set to 0, the training succeeds, but takes really long.
I also tried to reduce the batch size (to 8 and 2), but the issue is still remaining with multiple workers.
I looked at the GPU memory during training, and it doesn’t seem to be the issue.
Do you have any explanation for this type of error ?
Is there a way to solve this error, in order to be able to use multiple workers ?
Thank you !
class DICOM2D_dataset(Dataset):
def __init__(self, root_dir, patient_df, transform=None, one_img=False, set='train'):
self.root_dir = root_dir
self.transform = transform
self.dir_from_root = '/data/01_raw/osic-pulmonary-fibrosis-progression/' + set
slices = []
for series in patient_df['Patient'].tolist():
if one_img == False:
for slice in os.listdir(os.path.join(root_dir + self.dir_from_root, series)):
slices.append(series + '/' + slice)
else:
slice = os.listdir(os.path.join(root_dir + self.dir_from_root, series))[0]
slices.append(series + '/' + slice)
self.slices = slices
def get_img_hu(self, dicom):
intercept = dicom[0x0028, 0x1052].value
slope = dicom[0x0028, 0x1053].value
image = dicom.pixel_array
image = (image * slope + intercept).astype(np.int16)
return image
def __len__(self):
return len(self.slices)
def __getitem__(self, idx):
slice = self.slices[idx]
path_dicom = os.path.join(self.root_dir + self.dir_from_root, slice)
d = pydicom.dcmread(path_dicom)
# get image data in hounsfield unit
image = self.get_img_hu(d)
# add channel
image = np.expand_dims(image, 0)
# apply transformation
if self.transform:
image = self.transform(image)
return image, path_dicom