Gradient penalty for lipchitzness not working with convolutional layers

I am trying to impose Lipchitzness using Grad Penalty like in WGAN on an encoder network. The backward on the gradient penatly term works and the model gets trained if the encoder network only consists of linear (fully connected) layers. On changing too / adding convolutional layers to the model, I get the error :
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Loss and gradient penatly code:

recon_loss = F.mse_loss(X_sample, X, size_average=False) / mb_size

X_recon = P2(z_dis)
recon_loss_P2 = F.mse_loss(X_recon, X, size_average=False) / mb_size
loss = recon_loss + recon_loss_P2

# gradient penalization (effectively, second order derivative)
gradQ = grad(z_con.mean(), X, create_graph=True)
gradQ0 = gradQ[0]
gradQ_norm = gradQ0.norm()
gradient_penalty = (gradQ_norm - 0.0).pow(2)
loss.backward(retain_graph=True)
gradient_penalty.backward() #this line gives the error

Encoder code:

class Flatten(nn.Module):
    def forward(self, input):
        return input.view(input.size(0), -1)

Q = nn.Sequential(
    nn.Conv2d(1, 1, 4, stride=2, padding=1), # it works on removing this line of code
    Flatten(),
    nn.Linear(784, 10)
)

Why might this be happening? I am also open to imposing the gradient penalty in another way if that works with convolutional layers.