Thanks for your reply.
I got this traceback with torch.autograd.set_detect_anomaly(True)
.
[W python_anomaly_mode.cpp:104] Warning: Error detected in CudnnBatchNormBackward. Traceback of forward call that caused the error:
File "<string>", line 1, in <module>
File "/home/lr/wuhao/anaconda3/envs/ssl/lib/python3.6/multiprocessing/spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "/home/lr/wuhao/anaconda3/envs/ssl/lib/python3.6/multiprocessing/spawn.py", line 118, in _main
return self._bootstrap()
File "/home/lr/wuhao/anaconda3/envs/ssl/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/home/lr/wuhao/anaconda3/envs/ssl/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/home/lr/wuhao/anaconda3/envs/ssl/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/home/lr/wuhao/ssl-consistency-pytorch/remixmatch.py", line 220, in main_worker
trainer(args, logger=logger)
File "/home/lr/wuhao/ssl-consistency-pytorch/models/remixmatch/remixmatch.py", line 153, in train
logits_rot = self.rot_classifier(x_ulb_s1_rot)
File "/home/lr/wuhao/ssl-consistency-pytorch/models/nets/wrn.py", line 117, in rot_classify
out = self.relu(self.bn1(out))
File "/home/lr/wuhao/anaconda3/envs/ssl/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/lr/wuhao/anaconda3/envs/ssl/lib/python3.6/site-packages/torch/nn/modules/batchnorm.py", line 140, in forward
self.weight, self.bias, bn_training, exponential_average_factor, self.eps)
File "/home/lr/wuhao/anaconda3/envs/ssl/lib/python3.6/site-packages/torch/nn/functional.py", line 2147, in batch_norm
input, weight, bias, running_mean, running_var, training, momentum, eps, torch.backends.cudnn.enabled
(function _print_stack)
It seems the relu
part in the model cause this error. However, by checking the code of model, I found I had set relu with non-inplace operation as self.relu = nn.LeakyReLU(negative_slope=leaky_slope, inplace=False)
.
Why it still cause this error? Could you give me some hint?