Bug on running TorchScript on H100

mikeybydun1 · April 12, 2025, 8:20pm

Hello,
I have a torch script that using torch 1.13.0 (cuda version),
I am compiling a pytorch code into .pt file and then run the model.
On every gpu its working well (a100 for example),
but when i run the same code on NVIDIA H100 the results just became nan.
Do you have any idea why?
Pytorch version? what i need to configure?
Thanks!

ptrblck · April 12, 2025, 9:02pm

PyTorch 1.13.0 was released with CUDA 11.6 and 11.7 while the Hopper architecture was introduced in CUDA 11.8 so you would need to update your PyTorch binary.

mikeybydun1 · April 12, 2025, 9:04pm

Thanks you very much my friend,
I have another question. I have a PyTorch torch script that do:
torch.prod of tensor in this shape: (1000,400,400,144).
This takes so much time (10 seconds on A100),
The only effective optimization i found is using bfloat16.
You have other suggestions?
Thanks!

ptrblck · April 13, 2025, 7:16pm

Using a lower dtype sounds like a good idea.