I encountered some strange problem while using yolov7
model with pytorch
to train my own model, here is some description:
- I am using RTX 4060 Notebook with
pytorch version == 2.3.1+cu118
&&CUDA version == 11.8
- Before I trained this model, I had successfully trained other models with only different dataset
- I changed back to the dataset that once succeeded in training, it still worked.
- the new dataset (one that fails training) has fewer number comparing to the old.
- the name of the new dataset shows like: ‘10-CMKNEL’, however the old like ‘yes0001’
- the error message appeared when I finished my first epoch of training, here is what it showed:
[00:00<00:0C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [0,0,0], thread: [4,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed. C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [0,0,0], thread: [106,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed. C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [0,0,0], thread: [76,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed. C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [0,0,0], thread: [56,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed. Unhandled exception caught in c10/util/AbortHandler.h 00007FFCC025805400007FFCC023C190 torch_python.dll!THPGenerator_initDefaultGenerator [<unknown file> @ <unknown line number>] 00007FFD1515EE1200007FFD1515EDF0 ucrtbase.dll!terminate [<unknown file> @ <unknown line number>] 00007FFD07631AAB00007FFD07631150 VCRUNTIME140_1.dll!_NLG_Return2 [<unknown file> @ <unknown line number>] 00007FFD0763231700007FFD07631150 VCRUNTIME140_1.dll!_NLG_Return2 [<unknown file> @ <unknown line number>] 00007FFD076340D900007FFD07634030 VCRUNTIME140_1.dll!_CxxFrameHandler4 [<unknown file> @ <unknown line number>] 00007FFD17C1504F00007FFD17C14F20 ntdll.dll!_chkstk [<unknown file> @ <unknown line number>] 00007FFD17B8E86600007FFD17B8DDD0 ntdll.dll!RtlFindCharInUnicodeString [<unknown file> @ <unknown line number>] 00007FFD17BC494500007FFD17BC47B0 ntdll.dll!RtlRaiseException [<unknown file> @ <unknown line number>] 00007FFD1534F39C00007FFD1534F330 KERNELBASE.dll!RaiseException [<unknown file> @ <unknown line number>] 00007FFD0816648000007FFD081663F0 VCRUNTIME140.dll!CxxThrowException [<unknown file> @ <unknown line number>] 00007FFCC3B2319E00007FFCC3B23130 c10.dll!c10::detail::torchCheckFail [<unknown file> @ <unknown line number>] 00007FFCC480F93F00007FFCC480F640 c10_cuda.dll!c10::cuda::c10_cuda_check_implementation [<unknown file> @ <unknown line number>] 00007FFB719764F100007FFB71973390 torch_cuda.dll!at::cuda::CachingHostAllocator_recordEvent [<unknown file> @ <unknown line number>] 00007FFCC3AEAE4D00007FFCC3AEAC80 c10.dll!c10::SymBool::operator| [<unknown file> @ <unknown line number>] 00007FFCC3ADEFC200007FFCC3ADEF50 c10.dll!c10::ConstantSymNodeImpl<bool>::~ConstantSymNodeImpl<bool> [<unknown file> @ <unknown line number>] 00007FFCC3B0D43500007FFCC3B0D3C0 c10.dll!c10::TensorImpl::~TensorImpl [<unknown file> @ <unknown line number>] 00007FFB69CB966500007FFB69CB7E40 torch_cpu.dll!at::DynamicLibrary::sym [<unknown file> @ <unknown line number>] 00007FFB69C5E71D00007FFB69C5E690 torch_cpu.dll!at::TensorBase::reset [<unknown file> @ <unknown line number>] 00007FFCC026378500007FFCC0258860 torch_python.dll!initModule [<unknown file> @ <unknown line number>] 00007FFCC02F188500007FFCC02BC540 torch_python.dll!THPPointer<THPStorage>::THPPointer<THPStorage> [<unknown file> @ <unknown line number>] 00007FFCC02F651D00007FFCC02F4160 torch_python.dll!THPVariable_Wrap [<unknown file> @ <unknown line number>] 00007FFCC3E21E0A00007FFCC3E21CE0 python310.dll!PyList_Append [<unknown file> @ <unknown line number>] 00007FFCC3E504FC00007FFCC3E50350 python310.dll!PyTuple_Pack [<unknown file> @ <unknown line number>] 00007FFCC3E0A71F00007FFCC3E0A060 python310.dll!PyObject_GenericGetDict [<unknown file> @ <unknown line number>] 00007FFCC3EEB7CD00007FFCC3EE7A70 python310.dll!PyEval_EvalFrameDefault [<unknown file> @ <unknown line number>] 00007FFCC3EECE7B00007FFCC3EE7A70 python310.dll!PyEval_EvalFrameDefault [<unknown file> @ <unknown line number>] 00007FFCC3DF585E00007FFCC3DF5820 python310.dll!PyFunction_Vectorcall [<unknown file> @ <unknown line number>] 00007FFCC3EE61A900007FFCC3EE6060 python310.dll!PyOS_URandomNonblock [<unknown file> @ <unknown line number>] 00007FFCC3EEE6F200007FFCC3EEE300 python310.dll!PyEval_GetFuncDesc [<unknown file> @ <unknown line number>] 00007FFCC3EEAF0800007FFCC3EE7A70 python310.dll!PyEval_EvalFrameDefault [<unknown file> @ <unknown line number>] 00007FFCC3EECE7B00007FFCC3EE7A70 python310.dll!PyEval_EvalFrameDefault [<unknown file> @ <unknown line number>] 00007FFCC3DF585E00007FFCC3DF5820 python310.dll!PyFunction_Vectorcall [<unknown file> @ <unknown line number>] 00007FFCC3EE61A900007FFCC3EE6060 python310.dll!PyOS_URandomNonblock [<unknown file> @ <unknown line number>] 00007FFCC3EEE6F200007FFCC3EEE300 python310.dll!PyEval_GetFuncDesc [<unknown file> @ <unknown line number>] 00007FFCC3EEA8E200007FFCC3EE7A70 python310.dll!PyEval_EvalFrameDefault [<unknown file> @ <unknown line number>] 00007FFCC3EECE7B00007FFCC3EE7A70 python310.dll!PyEval_EvalFrameDefault [<unknown file> @ <unknown line number>] 00007FFCC3EE78C200007FFCC3EE7840 python310.dll!PyEval_EvalCode [<unknown file> @ <unknown line number>] 00007FFCC3F5E08E00007FFCC3F5DD50 python310.dll!PyRun_FileExFlags [<unknown file> @ <unknown line number>] 00007FFCC3F5E16800007FFCC3F5DD50 python310.dll!PyRun_FileExFlags [<unknown file> @ <unknown line number>] 00007FFCC3F5DD1800007FFCC3F5DBB0 python310.dll!PyRun_StringFlags [<unknown file> @ <unknown line number>] 00007FFCC3F5BF7500007FFCC3F5BCD0 python310.dll!PyRun_SimpleFileObject [<unknown file> @ <unknown line number>] 00007FFCC3F5B12400007FFCC3F5B060 python310.dll!PyRun_AnyFileObject [<unknown file> @ <unknown line number>] 00007FFCC3D785DC00007FFCC3D70390 python310.dll!PyObject_GC_IsFinalized [<unknown file> @ <unknown line number>] 00007FFCC3D78FED00007FFCC3D70390 python310.dll!PyObject_GC_IsFinalized [<unknown file> @ <unknown line number>] 00007FFCC3D79E9300007FFCC3D79310 python310.dll!Py_RunMain [<unknown file> @ <unknown line number>] 00007FFCC3D79F0600007FFCC3D79EE0 python310.dll!Py_Main [<unknown file> @ <unknown line number>] 00007FF72ED1149400007FF72ED11110 python.exe!OPENSSL_Applink [<unknown file> @ <unknown line number>] 00007FFD16AA257D00007FFD16AA2560 KERNEL32.DLL!BaseThreadInitThunk [<unknown file> @ <unknown line number>] 00007FFD17BCAF2800007FFD17BCAF00 ntdll.dll!RtlUserThreadStart [<unknown file> @ <unknown line number>]
I want to know HOW to deal with it???