I’ve been trying to wrap my head around autograd for a while now. From what I’ve learned about automatic differentiation in class, it is possible to compute the gradient of functions written with code even when doing overwrites, for loops, etc… Additionally, the autograd tutorial states:
autogradpackage provides automatic differentiation for all operations on Tensors.
However, I understand that not everything is differentiable (e.g. a black-box function).
I’m currently reimplementing in PyTorch a neural network with a layer that performs a lookup operation in a table with an index derived from a prediction. That is, some floating-point input is rounded and cast as an integer in order to be used as an array index. Here is a mockup of what could be the forward pass of this layer:
class LookupLayer(Module): def __init__(self): super().__init__() self.lookup_array = torch.arange(10) def forward(self, x): index = torch.round(x).int() index = torch.max(0, torch.min(len(self.lookup_array) - 1, index)) return self.lookup_array[index]
The grad attribute of the parameters of the model for all layers above this one is
None, which I assume is due to autograd being unable to compute the derivative of this layer. That makes sense to me, as I don’t think that
int() has a derivative.
The author of the paper I am reproducing provides the backward pass implementation of the layer and so I believe that I should now put my layer as a subclass of
torch.autograd.Function rather than
torch.nn.Module and reimplement the backward pass there myself. Please correct me if I am wrong.
My questions are:
Is it trivial to tell if autograd is going to be able to derive the function or not?
If the forward pass is correct and if autograd is able to compute a gradient, is this gradient always correct?
I’ve watched this lecture to try to find answers to these questions, but I could not quite understand everything. If you have more resources I would be happy to have them.