How to use autograd to get gradients with respect to the input?

Shisho_Sama · September 29, 2019, 5:32pm

Hello everyone, I hope you are having a great time.
I’m trying to create a contractive autoencoder in Pytorch. I found this thread and tried according to that .
This is the snippet I wrote :


class Contractive_AutoEncoder(nn.Module):
    def __init__(self):
        super().__init__()
        self.encoder = nn.Linear(784, 512)
        self.decoder = nn.Linear(512, 784)

    def forward(self, input):
        # flatten the input
        shape = input.shape
        input = input.view(input.size(0), -1)
        output_e = F.relu(self.encoder(input))
        output = F.sigmoid(self.decoder(output_e))
        output = output.view(*shape)
        return output_e, output

def loss_function(output_e, outputs, imgs, device):
    output_e.backward(torch.ones(output_e.size()).to(device), retain_graph=True)
    criterion = nn.MSELoss()
    assert outputs.shape == imgs.shape ,f'outputs.shape : {outputs.shape} != imgs.shape : {imgs.shape}'
    
    imgs.grad.requires_grad = True 
    loss1 = criterion(outputs, imgs)
    print(imgs.grad)
    loss2 = torch.mean(pow(imgs.grad,2))
    loss = loss1 + loss2 
    return loss 

epochs = 50 
interval = 2000
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = Contractive_AutoEncoder().to(device)
optimizer = optim.Adam(model.parameters(), lr =0.001)

for e in range(epochs):
    for i, (imgs, labels) in enumerate(dataloader_train):
        imgs = imgs.to(device)
        labels = labels.to(device)

        outputs_e, outputs = model(imgs)
        loss = loss_function(outputs_e, outputs, imgs,device)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        if i%interval: 
            print('')

    print(f'epoch/epoechs: {e}/{epochs} loss : {loss.item():.4f} ')

for the sake of brevity I just used one layer for encoder and one layer for decoder. it should work regardless of number of layers in each decoder and encoder obviously!
but the catch here is, aside from the fact that I dont knwo if this is the correct way of doing this, (calculating gradients with respect to the input), I get an error which makes the former solution wrong/not applicable. That is,
imgs.grad.requires_grad = True
produces the error :

AttributeError : ‘NoneType’ object has no attribute ‘requires_grad’

I also tried the second method suggested in that thread which is as follows :

class Contractive_Encoder(nn.Module):
    def __init__(self):
        super().__init__()
        self.encoder = nn.Linear(784, 512)
        
    def forward(self, input):
        # flatten the input
        input = input.view(input.size(0), -1)
        output_e = F.relu(self.encoder(input))
        return output_e

class Contractive_Decoder(nn.Module):
    def __init__(self):
        super().__init__()
        self.decoder = nn.Linear(512, 784)

    def forward(self, input):
        # flatten the input
        output = F.sigmoid(self.decoder(input))
        output = output.view(-1,1,28,28)
        return output


epochs = 50 
interval = 2000
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

model_enc = Contractive_Encoder().to(device)
model_dec = Contractive_Decoder().to(device)

optimizer = optim.Adam([{"params":model_enc.parameters()},
                        {"params":model_dec.parameters()}], lr =0.001)

optimizer_cond = optim.Adam(model_enc.parameters(), lr = 0.001)

criterion = nn.MSELoss()

for e in range(epochs):
    for i, (imgs, labels) in enumerate(dataloader_train):
        imgs = imgs.to(device)
        labels = labels.to(device)

        outputs_e = model_enc(imgs)
        outputs = model_dec(outputs_e)
        loss_rec = criterion(outputs, imgs)
        optimizer.zero_grad()
        loss_rec.backward()
        optimizer.step()

        imgs.requires_grad_(True)
        y = model_enc(imgs)
        optimizer_cond.zero_grad()
        y.backward(torch.ones(imgs.view(-1,28*28).size()))

        imgs.grad.requires_grad = True
        loss = torch.mean([pow(imgs.grad,2)])
        optimizer_cond.zero_grad()
        loss.backward()
        optimizer_cond.step()
        
        if i%interval: 
            print('')

    print(f'epoch/epoechs: {e}/{epochs} loss : {loss.item():.4f} ')

but I face the error :

RuntimeError: invalid gradient at index 0 - got [128, 784] but expected shape compatible with [128, 512]

How should I go about this?
I really appreciate any kind of help in this .
Thanks a lot in advance

Shisho_Sama · October 5, 2019, 2:17pm

Any help in this is greatly appreciated guys! @albanD ,@ptrblck

vdw · October 6, 2019, 2:17am

imgs is a tensor right? so I think it should be

imgs.requires_grad = True

Maybe this FSGM Tutorial is helpful since it also relies on getting the gradient with respect to the input.

Shisho_Sama · October 6, 2019, 2:18am

Thanks a lot. really appreciate it. looking in to this right now

Shisho_Sama · October 6, 2019, 4:43am

OK, I ended up changing the loss function as follows, How ever is it correct? or did I mess something up here again?

def loss_function(output_e, outputs, imgs, device):
    criterion = nn.MSELoss()
    assert outputs.shape == imgs.shape ,f'outputs.shape : {outputs.shape} != imgs.shape : {imgs.shape}'
    
    loss1 = criterion(outputs, imgs)
    output_e.backward(torch.ones(output_e.size()).to(device), retain_graph=True)    
    loss2 = torch.mean(pow(imgs.grad,2))
    imgs.grad.data.zero_()
    loss = loss1 + loss2 
    return loss

the notable changes are

instead ofimgs.grad.requires_grad=True, I used imgs.requires_grad=True as it doesn’t make any sense to me, to make grads require grads! while imgs doesn’t require it in first place and it will always result in error (NoneType doesnt have require_grads!)
in order for imgs to have gradients, you need to remember:
First imgs is a non-leaf node. autograd wont store grads for non-leaf nodes. in order to make them have gradients, you should use imgs.retain_grad(). retain_grad() must be called before doing forward().
Second.requires_grad is not retroactive, which means it must be set prior to running forward()

So the training loop should look like this ultimately :

for e in range(epochs):
    for i, (imgs, labels) in enumerate(dataloader_train):
        imgs = imgs.to(device)
        labels = labels.to(device)

        imgs.retain_grad()
        imgs.requires_grad_(True)
        
        outputs_e, outputs = model(imgs)
        loss = loss_function(outputs_e, outputs, imgs, lam,device)

        imgs.requires_grad_(False)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    print(f'epoch/epochs: {e}/{epochs} loss: {loss.item():.4f}')