@ptrblck Here is the detailed error log:
However, if I reduce the num_workers to 0 from 2 then it is working properly, but it increases the training time significantly.
RuntimeError: DataLoader worker (pid 24459) is killed by signal: Bus error. It is possible that dataloader's workers are out of shared memory. Please try to raise your shared memory limit.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/.local/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 20, in _wrap
fn(i, *args)
File "/home/.local/lib/python3.7/site-packages/pytorch_lightning/trainer/distrib_data_parallel.py", line 389, in ddp_train
self.run_pretrain_routine(model)
File "/home/.local/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1015, in run_pretrain_routine
self.train()
File "/home/.local/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 347, in train
self.run_training_epoch()
File "/home/.local/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 451, in run_training_epoch
self.run_evaluation(test_mode=self.testing)
File "/home/.local/lib/python3.7/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 377, in run_evaluation
eval_results = self._evaluate(self.model, dataloaders, max_batches, test_mode)
File "/home/.local/lib/python3.7/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 256, in _evaluate
for batch_idx, batch in enumerate(dataloader):
File "/home/.local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 345, in __next__
data = self._next_data()
File "/home/.local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 841, in _next_data
idx, data = self._get_data()
File "/home/.local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 808, in _get_data
success, data = self._try_get_data()
File "/home/.local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 774, in _try_get_data
raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str))
RuntimeError: DataLoader worker (pid(s) 24459) exited unexpectedly