PyTorch metric problem RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

SwapnanilHalder · May 11, 2020, 9:26pm

I am trying to create an ANN Deep Learning model with PyTorch, and separated functions for calculate loss, and evaluate the model on the validation set. I am having an error RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn.

my code is :

model = MNISTModel()

def accuracy(outputs, labels):
    _, preds = torch.max(outputs, dim=1)
    return torch.sum(preds == labels).item() / len(preds)

def loss_batch(model, loss_fn, x, y, opt = None, metric = None):
    x = Variable(x).cuda()
    y = Variable(y.cuda())
    preds = model(x)
    loss = loss_fn(preds, y)
    
    if opt is not None :
        loss.backward()
        opt.step()
        opt.zero_grad()
        
    metric_result = None
    if metric is not None:
        metric_result = metric(preds, y)
        
    return loss.item(), len(x), metric_result

def val_evaluate(model, loss_fn, val_loader, metric = None):
    with torch.no_grad():
        results = [loss_batch(model, loss_fn, x, y, metric) for x,y in val_loader]
        
        losses, nums, metrics = zip(*results)
        
        total = np.sum(nums)

        # Avg. loss across batches 
        avg_loss = np.sum(np.multiply(losses, nums)) / total
        avg_metric = None
        if metric is not None:
            # Avg. of metric across batches
            avg_metric = np.sum(np.multiply(metrics, nums)) / total
    return avg_loss, total, avg_metric

def fit(epochs, lr, model, loss_fn, train_dl, 
        valid_dl, metric=None, opt_fn=None):
    losses, metrics = [], []
    
    # Instantiate the optimizer
    if opt_fn is None: opt_fn = torch.optim.SGD
    opt = torch.optim.SGD(model.parameters(), lr=lr)
    
    for epoch in range(epochs):
        # Training
        for xb,yb in train_dl:
            loss,_,_ = loss_batch(model, loss_fn, Variable(xb), Variable(yb), opt)

        # Evaluation
        result = val_evaluate(model, loss_fn, valid_dl, metric)
        val_loss, total, val_metric = result
        
        # Record the loss & metric
        losses.append(val_loss)
        metrics.append(val_metric)
        
        # Print progress
        if metric is None:
            print('Epoch [{}/{}], Loss: {:.4f}'
                  .format(epoch+1, epochs, val_loss))
        else:
            print('Epoch [{}/{}], Loss: {:.4f}, {}: {:.4f}'
                  .format(epoch+1, epochs, val_loss, 
                          metric.__name__, val_metric))
    return losses, metrics

losses1, metrics1 = fit(5, 0.5, model, nn.CrossEntropyLoss(), 
                        train_loader, val_loader, accuracy)

By comparing it to actual correct code, I found that
whenever I am giving metric = metric in val_evaluate function in the line to get ‘Results’, I am no longer getting this error. But whenever I am passing only metric, this error comes out :

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn.

but, the accuracy metric has no relation with grad_fn, then why this error is coming?

ptrblck · May 12, 2020, 6:29am

You are calling loss_batch(model, loss_fn, x, y, metric) in val_evaluate inside the torch.no_grad() block, which is the correct thing to do.
However, loss_batch expects the arguments: model, loss_fn, x, y, opt = None, metric = None, so you are currently passing metric as the opt argument.
Inside loss_batch, the opt argument will be checked for not None and the backward call will be executed.

Passing metric=metric solves this issue, as opt is then indeed None.

SwapnanilHalder · May 12, 2020, 1:40pm

Ya, I just figured it out. Such a silly mistake to overlook. Sorry for bothering you.