Hello,
I have a torch script that using torch 1.13.0 (cuda version),
I am compiling a pytorch code into .pt file and then run the model.
On every gpu its working well (a100 for example),
but when i run the same code on NVIDIA H100 the results just became nan.
Do you have any idea why?
Pytorch version? what i need to configure?
Thanks!
PyTorch 1.13.0 was released with CUDA 11.6 and 11.7 while the Hopper architecture was introduced in CUDA 11.8 so you would need to update your PyTorch binary.
Thanks you very much my friend,
I have another question. I have a PyTorch torch script that do:
torch.prod of tensor in this shape: (1000,400,400,144).
This takes so much time (10 seconds on A100),
The only effective optimization i found is using bfloat16.
You have other suggestions?
Thanks!
Using a lower dtype
sounds like a good idea.