Cuda 10.1 error using transposeconv2d with output_padding 1

Might be off topic, but I did not find where to report a bug on nvidia website. I tried training a GAN based on pix2pixhd architecture with Amp ‘O1’ opt_level with cuda 10.1 cudnn 7.6 pytorch nightly build ubuntu 18.04.
Code breaks here:
File “./”, line 263, in
File “/usr/local/lib/python3.6/dist-packages/torch/”, line 118, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File “/usr/local/lib/python3.6/dist-packages/torch/autograd/”, line 93, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: _cublasOpFromChar input should be ‘t’, ‘n’ or ‘c’ but got `

This kind of code should reproduce the behaviour:

class Modelis(nn.Module):
   def __init__(self):
       self.conv = nn.Conv2d(in_channels=128, out_channels=256, kernel_size=3, 
                             stride=2, padding=1)
       self.deconv = nn.ConvTranspose2d(in_channels=256, out_channels=128, 
           kernel_size=3, stride=2, padding=1, output_padding=1)
   def forward(self, x):
       x = self.conv(x)
       x = self.deconv(x)
       return x

Criterion = nn.bcewithlogits()
netG = Modelis()
netG = netG.cuda()
optimizerG = optim.Adam(netG.parameters(), lr=0.001, betas=(0.5, 0.999))
netG, optimizerG = amp.initialize(netG, optimizerG, opt_level='O1') 

for i in range(100):
    batch = (torch.randn(8,128,16,16).cuda() - 0.5) * 2
    output = netG(batch)
    loss = Criterion(output, torch.ones_like(output))

    with amp.scale_loss(loss, optimizerG) as scaled_loss:

It works in fp32 mode. After experiments have found that backward breaks on nn.transposeconv2d when output_padding is 1 (not 0). Using pytorch docker container (cuda 10.0, cudnn 7, pytorch 1.0) both fp32 and fp16 works fine.


Thanks for reporting this issue. We’ll look into it.

We could reproduce this issue and are tracking it here.
Thanks @ngimel for the support tracking down this issue.