# What's the difference in gradient backprop between slicing and nn.Unfold?

My code used to contain loop slicing which was slow. Recently I changed it into nn.Unfold which computes the same reult. However, I found that my model converged into a worse status, whose evaluation metrics deteriorates. After inspection, I found that although they output the same result, the gradiet produced by back propagation are different. What should I do? My envirnment is python==3.6.10, torch==1.4.0. Here is a simple code snippet reproducing the difference:

``````import torch
import torch.nn as nn
import torch.nn.functional as F

class simple_net(nn.Module):
def __init__(self):
super(simple_net, self).__init__()
self.conv1 = nn.Conv2d(3, 8, 3, 1, 1)
self.conv2 = nn.Conv2d(8, 3, 3, 1, 1)

def forward(self, x):
x = self.conv1(x)
x = self.conv2(x)
return x

model = simple_net().cuda()
x = torch.randn([6,3,8,8], dtype=torch.float32).cuda()
result = model(x)
b, c, h, w = result.shape
slice_list = []
for i in range(h):
for j in range(w):
slicetmp = result_pad_1[..., i:i+3, j:j+3].reshape(b, -1)
slice_list.append(slicetmp)
result_slice = torch.stack(slice_list, dim=2)
print("forward result:, ", str(torch.sum(torch.abs(result_unfold - result_slice))))
loss = result_unfold.abs().sum() + result_slice.abs().sum()
loss.backward()

``````

outputs:

``````forward result:,  tensor(0., device='cuda:0', grad_fn=<SumBackward0>)
If you’d like to know whether the two outputs have the same value, computing the sum is one heuristic, but you can also use `torch.allclose(a, b)`
If you’d like to compute whether two functions compute the same gradient. You should use `grad_inp = torch.autograd.grad(output, inp)` to compute the gradients of inputs wrt each of the outputs respectively, and then compare them with `torch.allclose`.