Hi All,
I am trying to implement Contractive Autoencoder which require gradient in loss function.
When the getting jacobian is not as easy as getting the weights in multi-layer encoder, which option would you recommend?
Option 1:
optim_rec = SGD([enc.parameters(), dec.parameters()])
optim_con = SGD(enc.parameters())
# reconstruction loss
y = enc(x)
x_ = dec(y)
loss = smoothL1(x, x_)
optim_rec.zero_grad()
loss.backward()
optim_rec.step()
# contractive loss
x.requires_grad = True
y = enc(x)
optim_con.zero_grad()
y.backward(ones(y.size()))
x.grad.requires_grad = True
x.grad.volatile = False
loss = mean(pow(x.grad, 2))
optim_con.zero_grad()
loss.backward()
optim_con.step()
Option 2
optim = SGD([enc.parameters(), dec.parameters()])
# losses
x.requires_grad = True
y = enc(x)
x_ = dec(y)
optim.zero_grad()
y.backward(ones(y.size()), retain_graph=True)
x.grad.requires_grad = True
x.grad.volatile = False
loss = [mean(pow(x.grad, 2)), smoothL1(x, x_)]
optim.zero_grad()
sum(loss).backward()
optim.step()
As recommeded in Autograd document, option 1 refrained from using retain_graph but requires 2 separate forward passes and optimizers. Both options work for me but I would like to know if there are more elegant methods to use Autograd.
Thankss.