I’m having trouble performing matrix inversion on the GPU - on a matrix that inverts fine on the CPU. I am using Google Colab with torch version 1.3.0+cu100. Here is my code:
import torch
dim = 100
# CPU inversion
A = torch.rand(dim,dim,device='cpu')
Ainv = A.inverse()
print(torch.matmul(A,Ainv))
# GPU inversion
A = A.to('cuda')
Ainv = A.inverse()
print(torch.matmul(A,Ainv))
For a small matrix (i.e. setting dim = 100), I get the following error:
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)
For a large matrix (i.e. setting dim = 1000), I get the following error:
RuntimeError: inverse_cuda: U(1,1) is zero, singular U.
In both cases, the inversion goes fine on the CPU, but inverting the same matrix on the GPU fails. Any help is appreciated!
Edit: Running the above code on another workstation with torch version 1.0.1.post2 does not produce this error.
However, when I run a repo’s code, I get the same exact error:
(same if I run with or without CUDA_LAUNCH_BLOCKING=1)
$ CUDA_LAUNCH_BLOCKING=1 python demo.py --filename input/easy_bat.jpg --class_name bat
2021-03-26 18:06:07,542 INFO Calling with args: Namespace(class_name='bat', filename='input/easy_bat.jpg', lw_collision=None, lw_depth=None, lw_inter=None, lw_inter_part=None, lw_scale=None, lw_scale_person=None, lw_sil=None, mesh_index=0, output_dir='output')
2021-03-26 18:06:10,955 INFO Loading checkpoint from detectron2://PointRend/InstanceSegmentation/pointrend_rcnn_R_50_FPN_3x_coco/164955410/model_final_3c3198.pkl
2021-03-26 18:06:10,962 INFO URL https://dl.fbaipublicfiles.com/detectron2/PointRend/InstanceSegmentation/pointrend_rcnn_R_50_FPN_3x_coco/164955410/model_final_3c3198.pkl cached in /home/grad3/jalal/.torch/fvcore_cache/detectron2/PointRend/InstanceSegmentation/pointrend_rcnn_R_50_FPN_3x_coco/164955410/model_final_3c3198.pkl
2021-03-26 18:06:11,069 INFO Reading a file from 'Detectron2 Model Zoo'
WARNING: You are using a SMPL model, with only 10 shape coefficients.
class_name: bat
0%| | 0/800.0 [00:00<?, ?it/s]Traceback (most recent call last):
File "demo.py", line 145, in <module>
main(get_args())
File "demo.py", line 121, in main
instances=instances, class_name=args.class_name, mesh_index=args.mesh_index
File "/scratch3/research/code/phosa/phosa/pose_optimization.py", line 406, in find_optimal_poses
num_initializations=num_initializations,
File "/scratch3/research/code/phosa/phosa/pose_optimization.py", line 287, in find_optimal_pose
vertices=torch.matmul(vertices.unsqueeze(0), rotations_init),
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemmStridedBatched( handle, opa, opb, m, n, k, &alpha, a, lda, stridea, b, ldb, strideb, &beta, c, ldc, stridec, num_batches)`
0%| | 0/800.0 [00:00<?, ?it/s]
Segmentation fault