UBLAS_STATUS_NOT_SUPPORTED when calling cublasGemmStridedBatchedEx" During Grounding DINO Training

I am encountering an issue while using the NVIDIA TAO Toolkit version 5.5. I followed the instructions in the Grounding DINO sample notebook available at the official NVIDIA

GitHub repository: Grounding DINO Notebook

Steps to Reproduce:

  1. Installed NVIDIA TAO Toolkit 5.5.

  2. Configured the train.yaml file for training Grounding DINO.

  3. Ran the following training command:
    !tao model grounding_dino train
    -e $SPECS_DIR/train.yaml
    train.num_gpus=$NUM_TRAIN_GPUS
    results_dir=$RESULTS_DIR

  4. During training, the following error was raised:
    RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling cublasGemmStridedBatchedEx(…)

System Details:

  • GPU: NVIDIA GeForce GTX TITAN X
  • Driver Version: 555.42.06
  • CUDA Version: 12.5 (nvidia-smi)
  • CUDA Version: 12.4 (miniconda-environment)
  • PyTorch Version: 2.5.1+cu124 (miniconda-environment)

Additional Notes:

  1. Using Automatic Mixed Precision (AMP) with bfloat16.
    
  2. Reducing the batch size and disabling AMP partially mitigated the issue but training still fails.
    
  3. Input size and dataset configurations follow the default settings in the sample notebook.
    

I suspect the issue may be related to:

  • Incompatibility of bfloat16 precision with my GPU.
    
  • A mismatch in CUDA/PyTorch versions.
    

Any guidance or suggestions would be greatly appreciated!

Could you isolate which call fails and post a minimal and executable code snippet reproducing the error, please?