Gradient of (test) Loss wrt (training) Input

vmk · January 28, 2020, 6:00am

I’m trying to implement Dataset Distillation using basic optimizers and autograd. A crucial part of the algorithm is training the network on a small dataset, evaluating the loss of this network on another dataset, and calculating the gradient of this final loss wrt the training data

When I try to directly implement this, by using an optimizer to update weights and torch.nn.loss for loss, autograd gives the following error:

One of the differentiated Tensors appears to not have been used in the graph

How can this be implemented using basic autograd and optimisers?

albanD · January 30, 2020, 1:27am

Hi,

All the optimizers in torch.optim do not perform their updates in a differentiable manner.
Similarly, all the layers in nn contain nn.Parameter() that are leaf Tensors. Meaning they cannot have gradient history and so you cannot propagate gradients through their update.

I would recommend checking the higher library from FAIR that does this though.