I’m relatively new to PyTorch but not to deep learning and I have a question concerning implementation of gradient updates:
Can I pass custom gradients to torch.optim.Adam and similar optimizers?
I’m trying to implement DNI which, in some parts, uses approximations of gradients (which are used in an optimization algorithm such as Adam) to update the parameters.
The way I understand how PyTorch works is that each optimizer contains a list of parameters it’s going to change and when it’s called and it uses torch.autograd.Variable’s .grad parameter as the inputs to its optimization procedure. It then assigns the transformed gradients to that same .grad parameter.
Now, my question is: how can sidestep this procedure where the torch.optim class has the “side effect” using the models parameters, instead of an explicit argument?
I’ve thought of two solutions, none which seem to work:
- Manually assigning the .grad parameter and then using the optimizer - but documentation says “This attribute is lazily allocated and can’t be reassigned.”
- Manually changing the param.data value, but then I’d have to redefine the optimizer I want to use myself and that doesn’t seem like a genuine solution to this problem
What is the preferred way of doing this?
I’ve seen one post that touches upon this question, and actually has several links to projects that are doing something similar.
But those don’t seem to be genuine solutions to this problem (the DFA project is using the 2nd method I described) and the DNI project doesn’t seem to have a clear explanation of what it is actually doing.
I figured I’d start a topic to discuss a principled way to solve problem of usage of custom gradients in torch.optim and provide a clear reference to people who are trying to solve the same problem as me.
I apologize if I’m breaking any rules or missing something obvious