Hi, I am trying to execute the following commands on cuda but facing the error. The code is running fine on CPU and with smaller matrix size (like 400 instead of 4096 with cuda). I am using Volta100 GPU with pytorch 1.9 and cuda 11.1. Can anyone recommend any workaround? Thanks.
A = torch.randn(2, 3, 4096).cuda()
B = torch.randn(2, 3, 3).cuda()
X = torch.linalg.solve( B,A)
RuntimeError Traceback (most recent call last)
/tmp/ipykernel_29550/983601019.py in
1 A = torch.randn(2, 3, 4096).cuda()
2 B = torch.randn(2, 3, 3).cuda()
----> 3 X = torch.linalg.solve( B,A)
RuntimeError: CUDA error: invalid configuration argument
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.