Normal distribution: RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Good day!

I am trying to train model to learn parameters for multiple Normal distributions, but receiving weird error.
Here is my code:

import torch
import torch.nn as nn
import random

def gen_data(num_classes):
    in_mean = [random.randint(-5, +5) for i in range(num_classes)]
    in_std = [random.random()+random.randint(1,3) for i in range(num_classes)]

    X = torch.tensor([])
    y = []
    idx = 0
    for mean, std in zip(in_mean, in_std):
        normal = torch.distributions.Normal(mean, std)
        n = random.randint(1, 100)
        sample = normal.sample((n, ),)
        X = torch.cat([X, sample])
        y += [idx]*n
        idx += 1
    y = torch.tensor(y).float()
    return X, y

class NaiveBayes(nn.Module):
    def __init__(self, num_classes):
        super().__init__()
        self.mean = nn.Parameter(torch.randn(num_classes), requires_grad=True)
        self.std = nn.Parameter(torch.abs(torch.randn(num_classes)), requires_grad=True)
        self.num_classes = num_classes

    def forward(self, x):
        x = x.repeat((self.num_classes, 1)).permute(1,0)
        normal = torch.distributions.Normal(self.mean, self.std)
        return normal.log_prob(x)

num_classes = 3
model = NaiveBayes(num_classes)
X, y = gen_data(num_classes)
criterion = torch.nn.MSELoss()

y_pred = model(X)          # shape: (num_samples, num_classes)
y_pred = y_pred.argmax(1)  # shape: (num_samples, )

loss = criterion(y, y_pred)
loss.backward()

After running this, the last line raises RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn. Are Normal distribution params trainable? If yes, what does this error suppose to mean?

torch==1.8.1

Thank you!

Hi,

the problem is that argmax is a not a differentiable operation (it is piecewise constant). So no gradient can flow back through the argmax operation.
And so the loss you get at the end does not require gradients because it is not computed from any input that requires gradient in a differentiable manner.

You most likely want to use softmax here and a criterion for classification that can handle that (like nll_loss).

Thank you @albanD so much! Yes indeed, I now find that argmax is not differentiable, and I can use CrossEntropyLoss instead. But how could one ucknowledge that from the error message in this case, how is it related? Also, maybe that would be helpful for a user to issue a warning that the differentiation flow has stopped there? Or it would not be possible to do?

This is the part of the message that says that the Tensor you called backward on doesn’t require gradients. Which means that it was involved in a non-differentiable op.