Conv3D runs very slow in fp16 and bf16

Thank you very much for pointing out those issues and possible solutions. After testing several combinations of torch and cudnn versions, I found that:

  • torch 2.8.0 + cudnn 9.10
  • torch 2.9.1 + cudnn 9.15

both resolve the problem. It appears that torch 2.9.0 is the problematic version, as it doesn’t work with either cuDNN version.

Thanks again for your help!