There is something I’m not understanding well about how autograd works. I am trying to implement a maximum likelihood model in pytorch. This model is maximizing a softmax distribution given some parameters, knowing some batch of data. My model class looks like this:
class Model(nn.Module): def __init__(self, param_size, kernel_size, max_value): super(Model, self).__init__() self.kernel_size = kernel_size self.max_value = max_value self.params = torch.ones(param_size, requires_grad = True).float() def forward(self, data): pot_data, norm_data = data ### global potential computation potential = torch.mul(pot_data, self.params) potential = torch.sum(potential, (1,2)) ### global normalization computation ### normalization = torch.mul(norm_data, self.params) normalization = torch.sum(normalization, dim = 2) normalization = torch.logsumexp(-normalization, dim = 1) normalization = torch.sum(normalization, dim = 1) batch_likelihood = torch.sum(-potential - normalization) return batch_likelihood
I have a standard training look with some data pre-processing :
for index, (img, label) in enumerate(dataloader): data = F.unfold(torch.unsqueeze(img, 1), kernel_size = model.kernel_size, stride = 1, padding = 0) batch_size = data.size(0) nb_patchs = data.size(-1) nb_repeats = model.max_value pot_data = (data[:,:,:] != data[:,4,:].view((batch_size, 1, nb_patchs))) dup_data = torch.unsqueeze(data, 1).repeat(1, model.max_value, 1, 1) ranges = torch.unsqueeze(torch.unsqueeze(torch.arange(0, model.max_value, 1), 0), 2) ranges = ranges.repeat(batch_size, 1, nb_patchs) dup_data [:, :,4,:] = ranges norm_data = (dup_data[:,:,:,:] != dup_data[:,:,4,:].view((batch_size, nb_repeats, 1, nb_patchs))) data = (pot_data, norm_data) optimizer.zero_grad() likelihood = model.forward(data) (-likelihood).backward() optimizer.step()
At every point in the
None, and I don’t understand why. Also, If I save my model parameters with
torch.save("model.pt") and use something like https://netron.app/, my computation graph doesn’t contain any operation.
However, my likelihood does still get minimized, and my parameters get updated at every batch.
This is very confusing to me: I would expect
self.params.grad_fn to be different from
None, for example
MulBackward after every
torch.mul in the forward function, but this is not the case. Why is that ?
Also, given that
None, how does the optimizer compute a gradient to update my parameters at every batch ?
I would be grateful for any help for better understanding how Autograd works in this case