I tried to train my model and it worked a few days before. However, when I tried to train again now I received this error. i have not changed any codes from the last one.
segmentation fault (core dumped)
then I did the faulthandler and it showed this:
Fatal Python error: Segmentation fault Current thread 0x00007f6ac0982740 (most recent call first): File "/usr/local/lib/python3.6/dist-packages/torch/cuda/comm.py", line 39 in broadcast_coalesced File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/_functions.py", line 21 in forward File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/replicate.py", line 72 in _broadcast_coalesced_reshape File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/replicate.py", line 89 in replicate File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/data_parallel.py", line 159 in replicate File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/data_parallel.py", line 154 in forward File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550 in __call__ File "train_2.py", line 79 in run File "train_2.py", line 120 in <module> Segmentation fault (core dumped)
the line 81 in train.py is
predicts = net(image)
line 122-123 is
if __name__ == '__main__': run()
i already reinstalled the python3.6.9 and it still gave me the same error.
this is my dataparallel code:
cuda_available = torch.cuda.is_available() device_ids = [0,1] #number of gpu available torch.cuda.set_device(device_ids) if cuda_available: net = net.cuda() net = nn.DataParallel(net, device_ids=device_ids)