Error during inference

Abhi_Agarwal · January 5, 2023, 2:19pm

I get the following error when trying to run a demo code for Multi Object Tracking from the following repo : Demo encountering Error · Issue #64 · megvii-research/MOTR · GitHub :

Traceback (most recent call last):
  File "demo.py", line 284, in <module>
    detector.run()
  File "demo.py", line 254, in run
    res = self.model.inference_single_image(cur_img.cuda().float(), (self.dataloader.seq_h, self.dataloader.seq_w), track_instances)
  File "/root/miniconda3/envs/deformable_detr/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context
    return func(*args, **kwargs)
  File "/h1/abhishek/blog/MOTR/models/motr.py", line 586, in inference_single_image
    track_instances=track_instances)
  File "/h1/abhishek/blog/MOTR/models/motr.py", line 515, in _forward_single_image
    hs, init_reference, inter_references, enc_outputs_class, enc_outputs_coord_unact = self.transformer(srcs, masks, pos, track_instances.query_pos, ref_pts=track_instances.ref_pts)
  File "/root/miniconda3/envs/deformable_detr/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/h1/abhishek/blog/MOTR/models/deformable_transformer_plus.py", line 162, in forward
    memory = self.encoder(src_flatten, spatial_shapes, level_start_index, valid_ratios, lvl_pos_embed_flatten, mask_flatten)
  File "/root/miniconda3/envs/deformable_detr/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/h1/abhishek/blog/MOTR/models/deformable_transformer_plus.py", line 266, in forward
    output = layer(output, pos, reference_points, spatial_shapes, level_start_index, padding_mask)
  File "/root/miniconda3/envs/deformable_detr/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/h1/abhishek/blog/MOTR/models/deformable_transformer_plus.py", line 237, in forward
    src = self.forward_ffn(src)
  File "/h1/abhishek/blog/MOTR/models/deformable_transformer_plus.py", line 225, in forward_ffn
    src2 = self.linear2(self.dropout2(self.activation(self.linear1(src))))
  File "/root/miniconda3/envs/deformable_detr/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/root/miniconda3/envs/deformable_detr/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 87, in forward
    return F.linear(input, self.weight, self.bias)
  File "/root/miniconda3/envs/deformable_detr/lib/python3.7/site-packages/torch/nn/functional.py", line 1612, in linear
    output = input.matmul(weight.t())
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`

Can someone help me debug what the issue is ?

I’m running this with the following dockerfile:

FROM nvidia/cuda:9.2-cudnn7-devel-ubuntu18.04

RUN apt-get update
RUN apt-get install ffmpeg libsm6 libxext6  -y

ptrblck · January 5, 2023, 11:08pm

The used docker container is quite old and I don’t know how you are building or installing PyTorch inside this container.
Could you update the base container (and PyTorch in case you are using an older release) and check if you are still running into this issue?

Abhi_Agarwal · January 6, 2023, 12:43pm

Installing higher version of CUDA breaks deformable DETR for some reason. I am just following the installation instructions of the repo linked above. For pytorch and cudatoolkit, I am using

conda install pytorch=1.5.1 torchvision=0.6.1 cudatoolkit=9.2 -c pytorch

ptrblck · January 6, 2023, 6:20pm

I would recommend to check what exactly is breaking in deformable DETR using newer releases, as neither PyTorch 1.5.1 now CUDA 9.2 will receive any backports of fixes.