I tried to train my model and it worked a few days before. However, when I tried to train again now I received this error. i have not changed any codes from the last one.
segmentation fault (core dumped)
then I did the faulthandler and it showed this:
Fatal Python error: Segmentation fault
Current thread 0x00007f6ac0982740 (most recent call first):
File "/usr/local/lib/python3.6/dist-packages/torch/cuda/comm.py", line 39 in broadcast_coalesced
File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/_functions.py", line 21 in forward
File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/replicate.py", line 72 in _broadcast_coalesced_reshape
File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/replicate.py", line 89 in replicate
File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/data_parallel.py", line 159 in replicate
File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/data_parallel.py", line 154 in forward
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550 in __call__
File "train_2.py", line 79 in run
File "train_2.py", line 120 in <module>
Segmentation fault (core dumped)
the line 81 in train.py is
predicts = net(image)
line 122-123 is
if __name__ == '__main__':
run()
i already reinstalled the python3.6.9 and it still gave me the same error.
this is my dataparallel code:
cuda_available = torch.cuda.is_available()
device_ids = [0,1] #number of gpu available
torch.cuda.set_device(device_ids[0])
if cuda_available:
net = net.cuda()
net = nn.DataParallel(net, device_ids=device_ids)