I am encountering an issue while using the NVIDIA TAO Toolkit version 5.5. I followed the instructions in the Grounding DINO sample notebook available at the official NVIDIA
GitHub repository: Grounding DINO Notebook
Steps to Reproduce:
-
Installed NVIDIA TAO Toolkit 5.5.
-
Configured the train.yaml file for training Grounding DINO.
-
Ran the following training command:
!tao model grounding_dino train
-e $SPECS_DIR/train.yaml
train.num_gpus=$NUM_TRAIN_GPUS
results_dir=$RESULTS_DIR -
During training, the following error was raised:
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling cublasGemmStridedBatchedEx(…)
System Details:
- GPU: NVIDIA GeForce GTX TITAN X
- Driver Version: 555.42.06
- CUDA Version: 12.5 (nvidia-smi)
- CUDA Version: 12.4 (miniconda-environment)
- PyTorch Version: 2.5.1+cu124 (miniconda-environment)
Additional Notes:
-
Using Automatic Mixed Precision (AMP) with bfloat16.
-
Reducing the batch size and disabling AMP partially mitigated the issue but training still fails.
-
Input size and dataset configurations follow the default settings in the sample notebook.
I suspect the issue may be related to:
-
Incompatibility of bfloat16 precision with my GPU.
-
A mismatch in CUDA/PyTorch versions.
Any guidance or suggestions would be greatly appreciated!