Hello. I’m hitting a strange error and can’t really find a cause.
When training a pix2pix model, if AMP is disabled, ie using “with autocast(enabled=False):”, everything works fine and the model trains, no issues at all, but when enabled, the backward step: amp_scaler.scale(loss).backward() produces a “RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED” error.
I could identify that the issue is produced exactly where the Discriminator loss is calculated if done with autocast enabled and later the backward step has the error. Disabling autocast in just this step (while keeping it enabled everywhere else) also works fine, meaning the Generator works fine (including the GAN component) with autocast enabled.
Any clues what could be the cause of this?