ConvTranspose1d extremely slow on GPU (T4), slower than CPU

Thank you for your explanations!

How to setup pyorch with cudnn8? By compiling from source with system cudnn8 library setup?

Gradients are still computed even with torch.no_grad()?