Hello all,
I have an error when loading the network on GPU. I link the libTorch 2.5.1 with CUDA 12.4 to my codes. It works fine on my local machine (with GTX 1070) and a testing machine (with RTX 4070 Ti). However, when I move the codes to a computing node with A100, the solver does not work and throws an error when loading the network. The error looks like the following:
terminate called after throwing an instance of 'c10::Error'
what(): _ivalue_ INTERNAL ASSERT FAILED at "XXPath_To_Codes/ThirdParty/libtorchCUDA/include/torch/csrc/jit/api/object.h":38, please report a bug to PyTorch.
Exception raised from _ivalue at XXPath_to_Codes/ThirdParty/libtorchCUDA/include/torch/csrc/jit/api/object.h:38 (most recent call first):
I set up the same environment (cuda driver) on the computing node and have no idea how to address this issue. Do you have any suggestions?
If I choose to load on the CPU, then there is no problem.
Note that the A100 is split into 7 instances. I don’t know if that could be the issue.
Thanks for your time.
Best, Weitao.