Hey All,
I am trying to implement a VAE, and i am having trouble calculating the gradient for the model. I believe this is happening in the decoder. The exact error message is Function AddmmBackward returned an invalid gradient at index 1 - got [10, 32] but expected shape compatible with [10, 1024]. Here is the decoder model.
class decoderNW(nn.Module):
def __init__(self):
super(decoderNW,self).__init__()
channels = 32
kernelSize = 4
padding = (2,0)
stride = (2,2)
outputpadding = (1,0)
self.FC1 = nn.Linear(channels, 1024)
self.FC2 = nn.Linear(channels, 10656)
self.deConv3x301 = nn.ConvTranspose2d(channels, 64, kernel_size=kernelSize, stride=stride, output_padding=outputpadding)
nn.init.xavier_uniform_(self.deConv3x301.weight)
self.deConv3x302 = nn.ConvTranspose2d(64, 128, kernel_size=kernelSize, stride=stride, output_padding=outputpadding)
nn.init.xavier_uniform_(self.deConv3x302.weight)
self.deConv3x303 = nn.ConvTranspose2d(128, 64, kernel_size=kernelSize, stride=stride, output_padding=outputpadding)
nn.init.xavier_uniform_(self.deConv3x303.weight)
self.deConv3x304 = nn.ConvTranspose2d(64, 3, kernel_size=kernelSize, stride=stride)
nn.init.xavier_uniform_(self.deConv3x304.weight)
self.bn1 = nn.BatchNorm1d(1024)
self.bn2 = nn.BatchNorm2d(64)
self.bn3 = nn.BatchNorm2d(128)
self.bn4 = nn.BatchNorm2d(64)
self.ReLU = nn.ReLU(inplace=True)
self.sigmoid = nn.Sigmoid()
def forward(self,x):
x = self.FC1(x)
x = self.bn1(x)
x = self.ReLU(x)
x = self.FC2(x)
# Reshape x as 8x42x75
x = x.view(x.size(0),32,9,37)
x = self.deConv3x301(x)
x = self.bn2(x)
x = self.ReLU(x)
x = self.deConv3x302(x)
x = self.bn3(x)
x = self.ReLU(x)
x = self.deConv3x303(x)
x = self.bn4(x)
x = self.ReLU(x)
x = self.deConv3x304(x)
x = self.sigmoid(x)
return(x)
I believe its happening when I am trying to reshape the tensor into a 2D tensor (like image) from FC to deconv layer.
I have tried using reshape function, but the same problem persists. Im not sure where I am going wrong. Any help is greatly appreciated.
Thanks.
PS: I get this error when I run backward(). Here is the code snippet for that!
optimizerVAE.zero_grad()
variationalAE.train()
vaeT = vaeT.to('cuda')
mu, sigma, xHat, z = variationalAE(srcClrT)
loss = vaeLoss(srcClrT, mu, sigma, xHat, z)
loss.backward()