RuntimeError: CUDA error: unspecified launch failure

When I use a simple network as the backbone, it can be trained normally. If I use a complex network for training, such an error will be reported.
terminate called after throwing an instance of ‘c10::Error’
what(): CUDA error: unspecified launch failure (insert_events at /pytorch/c10/cuda/CUDACachingAllocator.cpp:764)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x33 (0x7fe222fa3193 in /usr/local/lib/python3.5/dist-packages/torch/lib/libc10.so)
frame #1: + 0x17f66 (0x7fe2231e0f66 in /usr/local/lib/python3.5/dist-packages/torch/lib/libc10_cuda.so)
frame #2: + 0x19cbd (0x7fe2231e2cbd in /usr/local/lib/python3.5/dist-packages/torch/lib/libc10_cuda.so)
frame #3: c10::TensorImpl::release_resources() + 0x4d (0x7fe222f9363d in /usr/local/lib/python3.5/dist-packages/torch/lib/libc10.so)
frame #4: c10d::Reducer::~Reducer() + 0x449 (0x7fe2245b0b19 in /usr/local/lib/python3.5/dist-packages/torch/lib/libtorch_python.so)
frame #5: std::_Sp_counted_ptr<c10d::Reducer*, (__gnu_cxx::_Lock_policy)2>::_M_dispose() + 0x12 (0x7fe22458e8f2 in /usr/local/lib/python3.5/dist-packages/torch/lib/libtorch_python.so)
frame #6: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x46 (0x7fe223de8336 in /usr/local/lib/python3.5/dist-packages/torch/lib/libtorch_python.so)
frame #7: + 0x9f952b (0x7fe22458f52b in /usr/local/lib/python3.5/dist-packages/torch/lib/libtorch_python.so)
frame #8: + 0x2942d0 (0x7fe223e2a2d0 in /usr/local/lib/python3.5/dist-packages/torch/lib/libtorch_python.so)
frame #9: + 0x29555e (0x7fe223e2b55e in /usr/local/lib/python3.5/dist-packages/torch/lib/libtorch_python.so)
frame #10: /usr/bin/python() [0x586885]
frame #11: /usr/bin/python() [0x56d0d5]
frame #12: /usr/bin/python() [0x4e9767]
frame #13: /usr/bin/python() [0x51b357]
frame #14: /usr/bin/python() [0x51b36d]
frame #15: /usr/bin/python() [0x5beb98]
frame #16: /usr/bin/python() [0x5bec2e]
frame #17: /usr/bin/python() [0x62ee33]
frame #18: PyEval_EvalFrameEx + 0x4f5f (0x53fcdf in /usr/bin/python)
frame #19: PyEval_EvalFrameEx + 0x49f4 (0x53f774 in /usr/bin/python)
frame #20: PyEval_EvalFrameEx + 0x49f4 (0x53f774 in /usr/bin/python)
frame #21: /usr/bin/python() [0x5441d9]
frame #22: PyEval_EvalFrameEx + 0x50de (0x53fe5e in /usr/bin/python)
frame #23: /usr/bin/python() [0x5441d9]
frame #24: PyEval_EvalCode + 0x1f (0x544eaf in /usr/bin/python)
frame #25: PyRun_StringFlags + 0x8f (0x57bd1f in /usr/bin/python)
frame #26: PyRun_SimpleStringFlags + 0x3c (0x6257ac in /usr/bin/python)
frame #27: Py_Main + 0x581 (0x63efe1 in /usr/bin/python)
frame #28: main + 0xe1 (0x4d13f1 in /usr/bin/python)
frame #29: __libc_start_main + 0xf0 (0x7fe22868c840 in /lib/x86_64-linux-gnu/libc.so.6)
frame #30: _start + 0x29 (0x5d62d9 in /usr/bin/python)

Traceback (most recent call last):
File “train.py”, line 367, in
main()
File “train.py”, line 44, in main
mp.spawn(main_worker, nprocs=ngpus_per_node, args=(ngpus_per_node, cfg, val_dataset))
File “/usr/local/lib/python3.5/dist-packages/torch/multiprocessing/spawn.py”, line 171, in spawn
while not spawn_context.join():
File “/usr/local/lib/python3.5/dist-packages/torch/multiprocessing/spawn.py”, line 118, in join
raise Exception(msg)
Exception:

– Process 0 terminated with the following error:
Traceback (most recent call last):
File “/usr/local/lib/python3.5/dist-packages/torch/multiprocessing/spawn.py”, line 19, in wrap
fn(i, *args)
File “/tensorflow-facenet/train.py”, line 298, in main_worker
optimizer.step()
File “/usr/local/lib/python3.5/dist-packages/torch/optim/lr_scheduler.py”, line 66, in wrapper
return wrapped(*args, **kwargs)
File “/usr/local/lib/python3.5/dist-packages/torch/optim/sgd.py”, line 100, in step
buf.mul
(momentum).add_(1 - dampening, d_p)
RuntimeError: CUDA error: unspecified launch failure

Are you using the latest PyTorch release (1.7.1) and if not could update to it and rerun your script? If you are already on the latest version, could you post a minimal code snippet to reproduce this issue and post your current setup, i.e. used GPU, CUDA, cudnn version etc.?