I’ve been running the same code for several weeks without any problems, but just the other day I started getting this error. Training proceeds just fine for several thousand iterations, and then a CUDA error is raised. It seems to happen randomly. I have no idea what the problem could be. Here is the error message:
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: initialization error
Exception raised from insert_events at /pytorch/c10/cuda/CUDACachingAllocator.cpp:763 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7fa58ba762f2 in /home/catalys1/venv/lib/python3.9/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x5b (0x7fa58ba7367b in /home/catalys1/venv/lib/python3.9/
site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0xc92 (0x7fa58bcce682 in /home/catalys1/venv/lib/python3.9/site-packages/torch/lib/libc10_cuda.
so)
frame #3: c10::TensorImpl::release_resources() + 0x54 (0x7fa58ba5e3a4 in /home/catalys1/venv/lib/python3.9/site-packages/torch/lib/libc10.so)
frame #4: <unknown function> + 0x6e415a (0x7fa5de63915a in /home/catalys1/venv/lib/python3.9/site-packages/torch/lib/libtorch_python.so)
frame #5: <unknown function> + 0x233ea2 (0x55f73a3bcea2 in /home/catalys1/venv/bin/python)
frame #6: <unknown function> + 0x23383e (0x55f73a3bc83e in /home/catalys1/venv/bin/python)
frame #7: _PyObject_GC_New + 0xaa (0x55f73a3407ca in /home/catalys1/venv/bin/python)
frame #8: PyMethod_New + 0x25 (0x55f73a35cb75 in /home/catalys1/venv/bin/python)
frame #9: <unknown function> + 0x160423 (0x55f73a2e9423 in /home/catalys1/venv/bin/python)
frame #10: _PyObject_GetMethod + 0x10b (0x55f73a2d79cb in /home/catalys1/venv/bin/python)
frame #11: _PyEval_EvalFrameDefault + 0x541 (0x55f73a313bc1 in /home/catalys1/venv/bin/python)
frame #12: <unknown function> + 0x189ebf (0x55f73a312ebf in /home/catalys1/venv/bin/python)
frame #13: _PyObject_Call_Prepend + 0x46f (0x55f73a2abc7f in /home/catalys1/venv/bin/python)
frame #14: <unknown function> + 0x160aba (0x55f73a2e9aba in /home/catalys1/venv/bin/python)
frame #15: <unknown function> + 0x15db61 (0x55f73a2e6b61 in /home/catalys1/venv/bin/python)
frame #16: <unknown function> + 0x1d7785 (0x55f73a360785 in /home/catalys1/venv/bin/python)
frame #17: PyObject_Call + 0x22c (0x55f73a2ac40c in /home/catalys1/venv/bin/python)
frame #18: _PyEval_EvalFrameDefault + 0x2f9a (0x55f73a31661a in /home/catalys1/venv/bin/python)
frame #19: <unknown function> + 0x189ac1 (0x55f73a312ac1 in /home/catalys1/venv/bin/python)
frame #20: _PyObject_Call_Prepend + 0x46f (0x55f73a2abc7f in /home/catalys1/venv/bin/python)
frame #21: <unknown function> + 0x209eb9 (0x55f73a392eb9 in /home/catalys1/venv/bin/python)
frame #22: _PyObject_MakeTpCall + 0x7e (0x55f73a2aa76e in /home/catalys1/venv/bin/python)
frame #23: _PyEval_EvalFrameDefault + 0x51d3 (0x55f73a318853 in /home/catalys1/venv/bin/python)
frame #24: <unknown function> + 0x18a068 (0x55f73a313068 in /home/catalys1/venv/bin/python)
frame #25: _PyFunction_Vectorcall + 0x19d (0x55f73a2ab0ed in /home/catalys1/venv/bin/python)
frame #26: _PyEval_EvalFrameDefault + 0x3e9 (0x55f73a313a69 in /home/catalys1/venv/bin/python)
frame #27: _PyEval_EvalCodeWithName + 0x252 (0x55f73a3120a2 in /home/catalys1/venv/bin/python)
frame #28: PyEval_EvalCode + 0x27 (0x55f73a3a3147 in /home/catalys1/venv/bin/python)
frame #29: <unknown function> + 0x26fd82 (0x55f73a3f8d82 in /home/catalys1/venv/bin/python)
frame #30: <unknown function> + 0x1d9a03 (0x55f73a362a03 in /home/catalys1/venv/bin/python)
frame #31: PyObject_Call + 0x1d2 (0x55f73a2ac3b2 in /home/catalys1/venv/bin/python)
frame #32: _PyEval_EvalFrameDefault + 0x5c8e (0x55f73a31930e in /home/catalys1/venv/bin/python)
frame #33: <unknown function> + 0x189ac1 (0x55f73a312ac1 in /home/catalys1/venv/bin/python)
frame #34: _PyFunction_Vectorcall + 0x19d (0x55f73a2ab0ed in /home/catalys1/venv/bin/python)
frame #35: _PyEval_EvalFrameDefault + 0x4c3d (0x55f73a3182bd in /home/catalys1/venv/bin/python)
frame #36: _PyFunction_Vectorcall + 0x103 (0x55f73a2ab053 in /home/catalys1/venv/bin/python)
frame #37: _PyEval_EvalFrameDefault + 0x663 (0x55f73a313ce3 in /home/catalys1/venv/bin/python)
frame #38: _PyFunction_Vectorcall + 0x103 (0x55f73a2ab053 in /home/catalys1/venv/bin/python)
frame #39: _PyEval_EvalFrameDefault + 0x3e9 (0x55f73a313a69 in /home/catalys1/venv/bin/python)
frame #40: _PyFunction_Vectorcall + 0x103 (0x55f73a2ab053 in /home/catalys1/venv/bin/python)
frame #41: _PyEval_EvalFrameDefault + 0x3e9 (0x55f73a313a69 in /home/catalys1/venv/bin/python)
frame #42: _PyFunction_Vectorcall + 0x103 (0x55f73a2ab053 in /home/catalys1/venv/bin/python)
frame #43: <unknown function> + 0x121bed (0x55f73a2aabed in /home/catalys1/venv/bin/python)
frame #44: _PyObject_CallMethodIdObjArgs + 0x135 (0x55f73a2ac885 in /home/catalys1/venv/bin/python)
frame #45: PyImport_ImportModuleLevelObject + 0x3da (0x55f73a32bc7a in /home/catalys1/venv/bin/python)
frame #46: <unknown function> + 0x1de1fc (0x55f73a3671fc in /home/catalys1/venv/bin/python)
frame #47: <unknown function> + 0x1d9b1b (0x55f73a362b1b in /home/catalys1/venv/bin/python)
frame #48: PyObject_Call + 0x22c (0x55f73a2ac40c in /home/catalys1/venv/bin/python)
frame #49: _PyEval_EvalFrameDefault + 0x5c8e (0x55f73a31930e in /home/catalys1/venv/bin/python)
frame #50: <unknown function> + 0x189ac1 (0x55f73a312ac1 in /home/catalys1/venv/bin/python)
frame #51: _PyFunction_Vectorcall + 0x19d (0x55f73a2ab0ed in /home/catalys1/venv/bin/python)
frame #52: _PyEval_EvalFrameDefault + 0x3e9 (0x55f73a313a69 in /home/catalys1/venv/bin/python)
frame #53: <unknown function> + 0x18a068 (0x55f73a313068 in /home/catalys1/venv/bin/python)
frame #54: _PyFunction_Vectorcall + 0x19d (0x55f73a2ab0ed in /home/catalys1/venv/bin/python)
frame #55: <unknown function> + 0x121bed (0x55f73a2aabed in /home/catalys1/venv/bin/python)
frame #56: _PyObject_CallMethodIdObjArgs + 0x135 (0x55f73a2ac885 in /home/catalys1/venv/bin/python)
frame #57: PyImport_ImportModuleLevelObject + 0x46e (0x55f73a32bd0e in /home/catalys1/venv/bin/python)
frame #58: _PyEval_EvalFrameDefault + 0x3294 (0x55f73a316914 in /home/catalys1/venv/bin/python)
frame #59: _PyFunction_Vectorcall + 0x103 (0x55f73a2ab053 in /home/catalys1/venv/bin/python)
frame #60: _PyEval_EvalFrameDefault + 0x663 (0x55f73a313ce3 in /home/catalys1/venv/bin/python)
frame #61: _PyFunction_Vectorcall + 0x103 (0x55f73a2ab053 in /home/catalys1/venv/bin/python)
frame #62: _PyEval_EvalFrameDefault + 0x663 (0x55f73a313ce3 in /home/catalys1/venv/bin/python)
frame #63: <unknown function> + 0x18a068 (0x55f73a313068 in /home/catalys1/venv/bin/python)
Does anyone have any idea what could be causing this issue, and how I can fix it? Any help would be much appreciated. Thank you!