Why does pytorch prompt "[W accumulate_grad.h:170] Warning: grad and param do not obey the gradient layout contract. This is not an error, but may impair performance."?

It appeared when I was training the model.
After the 0th epoch, I verified the effect of the model.
But when the code starts the 1st epoch for training, this warning appears:

[W accumulate_grad.h:170] Warning: grad and param do not obey the gradient layout contract. This is not an error, but may impair performance.
grad.sizes() = [64, 768, 1, 1], strides() = [768, 1, 1, 1]
param.sizes() = [64, 768, 1, 1], strides() = [768, 1, 768, 768] (function operator())

What is the reason for this warning and how to avoid it?
Thanks!

6 Likes

@I-Love-U did you figure this out, I am getting the same error, I have no clue why this is happening.

[W accumulate_grad.h:170] Warning: grad and param do not obey the gradient layout contract. This is not an error, but may impair performance.
grad.sizes() = [64, 32, 1, 1], strides() = [32, 1, 1, 1]
param.sizes() = [64, 32, 1, 1], strides() = [32, 1, 32, 32] (function operator())

My Feature vector is of 64-d. The only thing I suspect in my code is that I am doing two separate forward passes in my learning loop through the same network.

1 Like

@Ram_Mohan Do you have any code that can be used to replicate this? And what version of PyTorch are you using?

It’s actually a small part of a big project so it might be difficult to replicate it, let me see if I can make a small google colab notebook to replicate this so that I can share that.

I am having similary problems:

Warning: grad and param do not obey the gradient layout contract. This is not an error, but may impair performance.

grad.sizes() = [256, 64, 1, 1], strides() = [64, 1, 1, 1]
param.sizes() = [256, 64, 1, 1], strides() = [64, 1, 64, 64] (function operator())

I am using pretrained resnet50 within my model. Considering the dimension, I believe it is related to the resnet part.

I’m going into the same problem. Does anyone have any suggestion to solve it or trace where the warning is raised?

I solved this warning by adding inplace=False in the Relu() of model building. This may be helpful to other cases.

I just encountered the same warning. I found that it’s because of initializing parameters in the in-place manner, like
conv.weight.data = NEW_WEIGHT.
However, I avoid this by rewriting the code to
conv.weight.data.fill_(0)
conv.weight.data += NEW_WEIGHT

My warning is gone by adding a contiguous() to the module input:

        features = self.model(input.contiguous())

Another Interesting finding is doing so improves FPS as well.
Without contiguous():
datatime 0.0026903152465820312 itrtime 1.7159864902496338 all 1.718679428100586
With contiguous():
datatime 0.0015590190887451172 itrtime 0.4502217769622803 all 0.4517836570739746

Here is an example triggers the warning

import torch
from torchvision.models.resnet import resnet50,resnet18

class model(torch.nn.Module):
    def __init__(this):
        super(model,this).__init__();
        this.core=resnet18();
    def forward(this,input):
        return this.core(input).sum(); # Warns you
        return this.core(input.contiguous()).sum(); # Safe


class dataloader():
    def next(this):
        return [torch.rand([16,3,128,128]).cuda(),torch.rand([128,32,32,3]).permute(0,3,1,2).cuda()];

class trainer():
    def __init__(this):
        this.model=model();
        this.dataloader=dataloader();
        this.optimizer=torch.optim.Adam(this.model.parameters(), lr=0.009, weight_decay=0.0005);

    def train(this):
        for i in range(9):
            this.optimizer.zero_grad();
            data=this.dataloader.next();
            loss0=this.model(data[0]);
            loss0.backward();
            loss1=this.model(data[1]);
            loss1.backward();
            this.optimizer.step();
        print("done");

t=trainer();
t.model.cuda();
t.train();


for me the problem was that i applyed changes for the data coming from train data loader but i forgot to apply it to the data coming from test data loader

Also seeing this with no obvious reason why:

[W accumulate_grad.h:185] Warning: grad and param do not obey the gradient layout contract. This is not an error, but may impair performance.
grad.sizes() = [12, 48, 1, 1], strides() = [48, 1, 1, 1]
param.sizes() = [12, 48, 1, 1], strides() = [48, 1, 48, 48] (function operator())

Looks very much to be a regression bug in PyTorch (running 1.9.1 here).