I was using a clip_grad_value_ in torch.nn.utils.clip_grad to clip the grad of resnet18 in torchvision.models. However, I encountered this error and I have no idea about how to debug.
File "/root/miniconda3/envs/myconda/lib/python3.10/site-packages/torch/nn/utils/clip_grad.py", line 122, in clip_grad_value_
grouped_grads = _group_tensors_by_device_and_dtype([grads])
File "/root/miniconda3/envs/myconda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/root/miniconda3/envs/myconda/lib/python3.10/site-packages/torch/utils/_foreach_utils.py", line 42, in _group_tensors_by_device_and_dtype
torch._C._group_tensors_by_device_and_dtype(tensorlistlist, with_indices).items()
RuntimeError: Expected nested_tensorlist[0].size() > 0 to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)
from torchvision.models import resnet18
from torch.nn.utils import clip_grad_value_
from torch import randn
x = randn(10, 3, 32, 32)
model = resnet18()
loss = model(x).sum()
clip_grad_value_(model.parameters(), 1)
I could reproduce this locally, one of the problems here is that you are not calling .backward() on the loss so autograd does not run and the list of grads in clip_grad_value is empty
e.g. This snippet should work
from torchvision.models import resnet18
from torch.nn.utils import clip_grad_value_
from torch import randn
x = randn(10, 3, 32, 32)
model = resnet18()
loss = model(x).sum().backward()
clip_grad_value_(model.parameters(), 1)
However clip_grad_value should return without error in the edge case where the all the grads are None, I will submit a PR to fix this.
I think I’m running into the same issue (NGC 24.01 container):
Expected !nested_tensorlist[0].empty() to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)
This is certainly after backward(), and the weird thing: I call
clip_grad_norm_(parameters, max_norm, foreach=True)
clip_grad_value_(parameters, max_value, foreach=True)
The former works. I can also call it twice. But the latter does not. I’d expect that norm_ and value_would behave the same…