Expected nested_tensorlist[0].size() > 0 to be true, but got false

jack_S · January 5, 2024, 10:01am

I was using a clip_grad_value_ in torch.nn.utils.clip_grad to clip the grad of resnet18 in torchvision.models. However, I encountered this error and I have no idea about how to debug.

  File "/root/miniconda3/envs/myconda/lib/python3.10/site-packages/torch/nn/utils/clip_grad.py", line 122, in clip_grad_value_
    grouped_grads = _group_tensors_by_device_and_dtype([grads])
  File "/root/miniconda3/envs/myconda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/root/miniconda3/envs/myconda/lib/python3.10/site-packages/torch/utils/_foreach_utils.py", line 42, in _group_tensors_by_device_and_dtype
    torch._C._group_tensors_by_device_and_dtype(tensorlistlist, with_indices).items()
RuntimeError: Expected nested_tensorlist[0].size() > 0 to be true, but got false.  (Could this error message be improved?  If so, please report an enhancement request to PyTorch.)

ptrblck · January 5, 2024, 3:52pm

Could you post a minimal and executable code snippet to reproduce the error, please?

jack_S · January 5, 2024, 3:59pm

Sure

from torchvision.models import resnet18
from torch.nn.utils import clip_grad_value_
from torch import randn


x = randn(10, 3, 32, 32)
model = resnet18()
loss = model(x).sum()
clip_grad_value_(model.parameters(), 1)

mikaylagawarecki · January 5, 2024, 4:15pm

I could reproduce this locally, one of the problems here is that you are not calling .backward() on the loss so autograd does not run and the list of grads in clip_grad_value is empty

e.g. This snippet should work

from torchvision.models import resnet18
from torch.nn.utils import clip_grad_value_
from torch import randn


x = randn(10, 3, 32, 32)
model = resnet18()
loss = model(x).sum().backward()
clip_grad_value_(model.parameters(), 1)

However clip_grad_value should return without error in the edge case where the all the grads are None, I will submit a PR to fix this.

jack_S · January 5, 2024, 4:23pm

Thanks for reminding me to call backward before clipping. I just forgot when the parameters have grads.

ptrblck · January 5, 2024, 4:42pm

As was already explained, you need to compute the gradients before clipping them. However, the error message might be improved.

JuriAbramov · February 19, 2024, 2:50pm

I think I’m running into the same issue (NGC 24.01 container):

Expected !nested_tensorlist[0].empty() to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)

This is certainly after backward(), and the weird thing: I call
clip_grad_norm_(parameters, max_norm, foreach=True)
clip_grad_value_(parameters, max_value, foreach=True)
The former works. I can also call it twice. But the latter does not. I’d expect that norm_ and value_would behave the same…

ptrblck · February 19, 2024, 2:57pm

Could you post a minimal and executable code snippet reproducing the issue?

JuriAbramov · February 19, 2024, 3:03pm

Tricky. But I think it has something to do with model.parameters() become a generated object.
If I convert it to a list, it seems to work.

ptrblck · February 19, 2024, 3:38pm

Yes, this might be the case as explained here as well.

Z-MU-Z · March 26, 2024, 12:30pm

I encountered the same problem. Everything was normal when I used torch 2.0.1+cu118. When I used torch 2.1.0+cu118, this error occurred.