Hi @albanD this is really strange. Since the code that OP has posted works for me with no error
$ python test_cublas.py
tensor([[ 1.0000e+00, 4.2183e-06, 6.3342e-07, ..., -2.5928e-06,
-8.4937e-07, -4.7684e-06],
[-1.1551e-06, 1.0000e+00, 3.6079e-07, ..., -1.7285e-06,
-1.6391e-07, -5.0068e-06],
[-5.3546e-07, 3.8960e-06, 1.0000e+00, ..., -1.8477e-06,
-6.8545e-07, -4.8876e-06],
...,
[-9.6698e-07, 2.2674e-06, -2.1878e-07, ..., 1.0000e+00,
-2.9802e-08, -2.7418e-06],
[ 9.3132e-07, 3.5167e-06, 1.7881e-07, ..., -2.9802e-06,
1.0000e+00, -4.6492e-06],
[-2.9802e-08, 5.2452e-06, 7.1526e-07, ..., -2.7716e-06,
-8.1770e-07, 9.9999e-01]])
tensor([[ 1.0000e+00, -3.4571e-06, 6.5565e-07, ..., -2.5034e-06,
5.9605e-07, 4.7684e-07],
[ 1.0490e-05, 1.0000e+00, 9.5367e-07, ..., -6.6757e-06,
-4.7684e-06, -9.5367e-06],
[ 0.0000e+00, -2.9802e-06, 1.0000e+00, ..., -2.3842e-07,
-2.7418e-06, -1.0490e-05],
...,
[-2.3842e-07, 2.5034e-06, -2.9802e-07, ..., 1.0000e+00,
2.5332e-06, 7.8678e-06],
[ 3.8147e-06, -1.4305e-06, 3.5763e-07, ..., -1.9073e-06,
1.0000e+00, 1.9073e-06],
[ 7.1526e-06, -1.6689e-06, -8.3447e-07, ..., 0.0000e+00,
-3.0994e-06, 1.0000e+00]], device='cuda:0')
However, when I run a repo’s code, I get the same exact error:
(same if I run with or without CUDA_LAUNCH_BLOCKING=1)
$ CUDA_LAUNCH_BLOCKING=1 python demo.py --filename input/easy_bat.jpg --class_name bat
2021-03-26 18:06:07,542 INFO Calling with args: Namespace(class_name='bat', filename='input/easy_bat.jpg', lw_collision=None, lw_depth=None, lw_inter=None, lw_inter_part=None, lw_scale=None, lw_scale_person=None, lw_sil=None, mesh_index=0, output_dir='output')
2021-03-26 18:06:10,955 INFO Loading checkpoint from detectron2://PointRend/InstanceSegmentation/pointrend_rcnn_R_50_FPN_3x_coco/164955410/model_final_3c3198.pkl
2021-03-26 18:06:10,962 INFO URL https://dl.fbaipublicfiles.com/detectron2/PointRend/InstanceSegmentation/pointrend_rcnn_R_50_FPN_3x_coco/164955410/model_final_3c3198.pkl cached in /home/grad3/jalal/.torch/fvcore_cache/detectron2/PointRend/InstanceSegmentation/pointrend_rcnn_R_50_FPN_3x_coco/164955410/model_final_3c3198.pkl
2021-03-26 18:06:11,069 INFO Reading a file from 'Detectron2 Model Zoo'
WARNING: You are using a SMPL model, with only 10 shape coefficients.
class_name: bat
0%| | 0/800.0 [00:00<?, ?it/s]Traceback (most recent call last):
File "demo.py", line 145, in <module>
main(get_args())
File "demo.py", line 121, in main
instances=instances, class_name=args.class_name, mesh_index=args.mesh_index
File "/scratch3/research/code/phosa/phosa/pose_optimization.py", line 406, in find_optimal_poses
num_initializations=num_initializations,
File "/scratch3/research/code/phosa/phosa/pose_optimization.py", line 287, in find_optimal_pose
vertices=torch.matmul(vertices.unsqueeze(0), rotations_init),
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemmStridedBatched( handle, opa, opb, m, n, k, &alpha, a, lda, stridea, b, ldb, strideb, &beta, c, ldc, stridec, num_batches)`
0%| | 0/800.0 [00:00<?, ?it/s]
Segmentation fault