It appeared when I was training the model.
After the 0th epoch, I verified the effect of the model.
But when the code starts the 1st epoch for training, this warning appears:
[W accumulate_grad.h:170] Warning: grad and param do not obey the gradient layout contract. This is not an error, but may impair performance. grad.sizes() = [64, 768, 1, 1], strides() = [768, 1, 1, 1] param.sizes() = [64, 768, 1, 1], strides() = [768, 1, 768, 768] (function operator())
What is the reason for this warning and how to avoid it?