According to the discussion on PyTorch forums ([ConvTranspose1d extremely slow on GPU (T4), slower than CPU]), I understand that the initial iteration of GPU operations can be slow. I am training a model with convolutional layers using input images of random resolutions. In each epoch, I select images corresponding to approximately 48 different resolutions to be sampled. Consequently, the convolution operations are extremely slow at the beginning of each epoch. Do you have any advice on how to mitigate this?