Why does pytorch prompt "[W accumulate_grad.h:170] Warning: grad and param do not obey the gradient layout contract. This is not an error, but may impair performance."?

I-Love-U · January 3, 2021, 9:05am

It appeared when I was training the model.
After the 0th epoch, I verified the effect of the model.
But when the code starts the 1st epoch for training, this warning appears:

[W accumulate_grad.h:170] Warning: grad and param do not obey the gradient layout contract. This is not an error, but may impair performance.
grad.sizes() = [64, 768, 1, 1], strides() = [768, 1, 1, 1]
param.sizes() = [64, 768, 1, 1], strides() = [768, 1, 768, 768] (function operator())

What is the reason for this warning and how to avoid it?
Thanks!

Ram_Mohan · February 2, 2021, 6:00pm

@I-Love-U did you figure this out, I am getting the same error, I have no clue why this is happening.

[W accumulate_grad.h:170] Warning: grad and param do not obey the gradient layout contract. This is not an error, but may impair performance.
grad.sizes() = [64, 32, 1, 1], strides() = [32, 1, 1, 1]
param.sizes() = [64, 32, 1, 1], strides() = [32, 1, 32, 32] (function operator())

My Feature vector is of 64-d. The only thing I suspect in my code is that I am doing two separate forward passes in my learning loop through the same network.

soulitzer · February 3, 2021, 4:41am

@Ram_Mohan Do you have any code that can be used to replicate this? And what version of PyTorch are you using?

Ram_Mohan · February 4, 2021, 12:10pm

It’s actually a small part of a big project so it might be difficult to replicate it, let me see if I can make a small google colab notebook to replicate this so that I can share that.

lthilnklover · March 19, 2021, 1:30am

I am having similary problems:

Warning: grad and param do not obey the gradient layout contract. This is not an error, but may impair performance.

grad.sizes() = [256, 64, 1, 1], strides() = [64, 1, 1, 1]
param.sizes() = [256, 64, 1, 1], strides() = [64, 1, 64, 64] (function operator())

I am using pretrained resnet50 within my model. Considering the dimension, I believe it is related to the resnet part.

noahcao · July 2, 2021, 4:24am

I’m going into the same problem. Does anyone have any suggestion to solve it or trace where the warning is raised?

noahcao · July 2, 2021, 8:43am

I solved this warning by adding inplace=False in the Relu() of model building. This may be helpful to other cases.

ElephantGym86 · July 22, 2021, 6:18am

I just encountered the same warning. I found that it’s because of initializing parameters in the in-place manner, like
conv.weight.data = NEW_WEIGHT.
However, I avoid this by rewriting the code to
conv.weight.data.fill_(0)
conv.weight.data += NEW_WEIGHT

Lynx_Commando · August 11, 2021, 6:33am

My warning is gone by adding a contiguous() to the module input:

        features = self.model(input.contiguous())

Another Interesting finding is doing so improves FPS as well.
Without contiguous():
datatime 0.0026903152465820312 itrtime 1.7159864902496338 all 1.718679428100586
With contiguous():
datatime 0.0015590190887451172 itrtime 0.4502217769622803 all 0.4517836570739746

Lynx_Commando · August 11, 2021, 7:02am

Here is an example triggers the warning

import torch
from torchvision.models.resnet import resnet50,resnet18

class model(torch.nn.Module):
    def __init__(this):
        super(model,this).__init__();
        this.core=resnet18();
    def forward(this,input):
        return this.core(input).sum(); # Warns you
        return this.core(input.contiguous()).sum(); # Safe


class dataloader():
    def next(this):
        return [torch.rand([16,3,128,128]).cuda(),torch.rand([128,32,32,3]).permute(0,3,1,2).cuda()];

class trainer():
    def __init__(this):
        this.model=model();
        this.dataloader=dataloader();
        this.optimizer=torch.optim.Adam(this.model.parameters(), lr=0.009, weight_decay=0.0005);

    def train(this):
        for i in range(9):
            this.optimizer.zero_grad();
            data=this.dataloader.next();
            loss0=this.model(data[0]);
            loss0.backward();
            loss1=this.model(data[1]);
            loss1.backward();
            this.optimizer.step();
        print("done");

t=trainer();
t.model.cuda();
t.train();

abdallah_ghazaly · November 1, 2021, 12:14pm

for me the problem was that i applyed changes for the data coming from train data loader but i forgot to apply it to the data coming from test data loader

rob · November 21, 2021, 5:21pm

Also seeing this with no obvious reason why:

[W accumulate_grad.h:185] Warning: grad and param do not obey the gradient layout contract. This is not an error, but may impair performance.
grad.sizes() = [12, 48, 1, 1], strides() = [48, 1, 1, 1]
param.sizes() = [12, 48, 1, 1], strides() = [48, 1, 48, 48] (function operator())

Looks very much to be a regression bug in PyTorch (running 1.9.1 here).

amsword · March 10, 2022, 5:44am

encountered the same issues. Any way to figure out where the problem is or how to debug? Tried to raise error when it is warning by the following, but it does not work.

warnings.filterwarnings(“error”)

I-Love-U · March 10, 2022, 12:58pm

Hello everyone.
In a previous project where this problem occurred, I used a vision transformer (such as ViT) to extract image features, and the connection with other convolutional structures involved a large number of operations to adjust the shape of the tensor.
This may be one of the reasons why my code has this problem, but I don’t know exactly why.

Miquel_Espinosa · April 12, 2022, 10:35am

Hi.
I’m encountering the same problem.
Any ideas what this warning means and what could be going wrong??

@ptrblck Could you give it a quick look?

In my case, this is what is printed after the first epoch (which takes much longer than usual)

[W accumulate_grad.h:184] Warning: grad and param do not obey the gradient layout contract. This is not an error, but may impair performance.
grad.sizes() = [98, 256, 1, 1], strides() = [256, 1, 1, 1]
param.sizes() = [98, 256, 1, 1], strides() = [256, 1, 256, 256] (function operator())

Thanks in advance.

ptrblck · April 12, 2022, 6:44pm

Are you also slicing the inputs before passing them to the model (without using .contiguous())?
If so, you might want to make the actual input contiguous to remove this warning (which would do it internally otherwise).

Miquel_Espinosa · April 13, 2022, 7:21am

I have tried adding the .contiguous() to the input tensor, but the warning is still printed.

This is the input that is passed to the model.

# Obtain inputs from DataLoader.
# Normalize images by dividing by 255
# Rearrange dimensions for input.
# Cast to float. Move to GPU. Make contiguous
input = torch.div(images['image'],255).permute((0, 3, 1, 2)).float().to(device).contiguous()

Should I just ignore the warning?

ptrblck · April 14, 2022, 4:12am

Yes, you could ignore the warning and (if possible) post a minimal code snippet so that we could check where the layout mismatch occurs.

Lynx_Commando · April 20, 2022, 9:12am

It may be triggered elsevier in the network, if you did indexing, permuting, shuffling, or alike things in the network.
And, FWIS, the warning packs quite a punch on training speed.

lizhenye2017 · October 30, 2022, 7:14pm

Hello, I also encountered the same problem when trying to combine transformer with cnn, have you solved it?