Stack expects a non-empty Tensor List pytorch while using gradient clipping

kl_divergence · June 16, 2020, 5:24am

I want to use gradient clipping

torch.nn.utils.clip_grad_norm(loss, model.parameters()

I’m getting the error stack expects a non-empty Tensor List. I read in the forums, @tom highlighted that gradients need to be computed first (loss.backward()) before calling clip_grad_norm_ (not much info in the documentation). I’m updating the model’s parameter in a bit different manner.

grad = torch.autograd.grad(loss, model.parameters())

And then doing the updating manually.

How can gradient clipping be used in this case ?

tom · June 16, 2020, 7:02am

Personally, I’d call that variable grads rather than grad.

You can grab clip_grad_norm_ from the source (link below) and then convert

for p in parameters:
   something with p.grad

into

for gr in grads:
  something with gr

Best regards

Thomas

github.com

pytorch/pytorch/blob/bcb44796ba00f9ac5f22e33f1de3a24b277c4c3a/torch/nn/utils/clip_grad.py#L6


import warnings
import torch
from torch._six import inf


def clip_grad_norm_(parameters, max_norm, norm_type=2):
    r"""Clips gradient norm of an iterable of parameters.

    The norm is computed over all gradients together, as if they were
    concatenated into a single vector. Gradients are modified in-place.

    Arguments:
        parameters (Iterable[Tensor] or Tensor): an iterable of Tensors or a
            single Tensor that will have gradients normalized
        max_norm (float or int): max norm of the gradients
        norm_type (float or int): type of the used p-norm. Can be ``'inf'`` for

kl_divergence · June 17, 2020, 12:17pm

Thanks for replying but I’m still not sure clip norm in this case. Inside the function, we don’t calculate grads, so do I compute that and then make modifications. I see that this clipping operation is applied on params, what changes do I make if I want an equivalent operation on grads then. And is this all really needed ? The only difference in how I update the model params.