It appeared when I was training the model.
After the 0th epoch, I verified the effect of the model.
But when the code starts the 1st epoch for training, this warning appears:
[W accumulate_grad.h:170] Warning: grad and param do not obey the gradient layout contract. This is not an error, but may impair performance.
grad.sizes() = [64, 768, 1, 1], strides() = [768, 1, 1, 1]
param.sizes() = [64, 768, 1, 1], strides() = [768, 1, 768, 768] (function operator())
What is the reason for this warning and how to avoid it?
Thanks!
@I-Love-U did you figure this out, I am getting the same error, I have no clue why this is happening.
[W accumulate_grad.h:170] Warning: grad and param do not obey the gradient layout contract. This is not an error, but may impair performance.
grad.sizes() = [64, 32, 1, 1], strides() = [32, 1, 1, 1]
param.sizes() = [64, 32, 1, 1], strides() = [32, 1, 32, 32] (function operator())
My Feature vector is of 64-d. The only thing I suspect in my code is that I am doing two separate forward passes in my learning loop through the same network.
It’s actually a small part of a big project so it might be difficult to replicate it, let me see if I can make a small google colab notebook to replicate this so that I can share that.
I just encountered the same warning. I found that it’s because of initializing parameters in the in-place manner, like conv.weight.data = NEW_WEIGHT.
However, I avoid this by rewriting the code to conv.weight.data.fill_(0) conv.weight.data += NEW_WEIGHT
My warning is gone by adding a contiguous() to the module input:
features = self.model(input.contiguous())
Another Interesting finding is doing so improves FPS as well.
Without contiguous():
datatime 0.0026903152465820312 itrtime 1.7159864902496338 all 1.718679428100586
With contiguous():
datatime 0.0015590190887451172 itrtime 0.4502217769622803 all 0.4517836570739746
for me the problem was that i applyed changes for the data coming from train data loader but i forgot to apply it to the data coming from test data loader
[W accumulate_grad.h:185] Warning: grad and param do not obey the gradient layout contract. This is not an error, but may impair performance.
grad.sizes() = [12, 48, 1, 1], strides() = [48, 1, 1, 1]
param.sizes() = [12, 48, 1, 1], strides() = [48, 1, 48, 48] (function operator())
Looks very much to be a regression bug in PyTorch (running 1.9.1 here).
encountered the same issues. Any way to figure out where the problem is or how to debug? Tried to raise error when it is warning by the following, but it does not work.
Hello everyone.
In a previous project where this problem occurred, I used a vision transformer (such as ViT) to extract image features, and the connection with other convolutional structures involved a large number of operations to adjust the shape of the tensor.
This may be one of the reasons why my code has this problem, but I don’t know exactly why.
In my case, this is what is printed after the first epoch (which takes much longer than usual)
[W accumulate_grad.h:184] Warning: grad and param do not obey the gradient layout contract. This is not an error, but may impair performance.
grad.sizes() = [98, 256, 1, 1], strides() = [256, 1, 1, 1]
param.sizes() = [98, 256, 1, 1], strides() = [256, 1, 256, 256] (function operator())
Are you also slicing the inputs before passing them to the model (without using .contiguous())?
If so, you might want to make the actual input contiguous to remove this warning (which would do it internally otherwise).
I have tried adding the .contiguous() to the input tensor, but the warning is still printed.
This is the input that is passed to the model.
# Obtain inputs from DataLoader.
# Normalize images by dividing by 255
# Rearrange dimensions for input.
# Cast to float. Move to GPU. Make contiguous
input = torch.div(images['image'],255).permute((0, 3, 1, 2)).float().to(device).contiguous()
It may be triggered elsevier in the network, if you did indexing, permuting, shuffling, or alike things in the network.
And, FWIS, the warning packs quite a punch on training speed.