Why backwarded gradient is different from the one using brute force

backwarded way:

x0 = 0; y0 = 0
t_out = net(target_img) #t_out.shape [1x10]
t_out[0,0].backward()
print(target_img.grad[x0, y0]) 
# out is 0.0007

brute force

x0 = 0; y0 = 0
h = 1e-3
with torch.no_grad():
    img0 = img.clone()
    img0[0, 0, x0, y0] += h
    imgcat = torch.cat((img0, img), dim=0)
    out = forward_target(net, imgcat, target)
    return (out[0] - out[1]) / h
# out is 4122

Not sure, but maybe it has sth to do with using net(target_img) vs forward_target(net, imgcat, target). There must be some issue in your code; I compared numerical to manual to autograd gradients many times in the past for complex neural nets and never found a discrepancy in PyTorch.

I would make your delta h much smaller though. E.g., when I use 1e-10 in this super simplistic example, I get the exact same results

import torch

x = torch.tensor([0.5], requires_grad=True, dtype=torch.float64)
out = torch.sigmoid(x)
out.backward()

x.grad

vs

torch.sigmoid(x) * (1.-torch.sigmoid(x))

vs

(torch.sigmoid(x) - torch.sigmoid(x+1e-10)) / (x - x+1e-10)

but at 1e-3 the numerical gradient falls apart (10% discrepancy in that case). Probably larger issue in deep architectures. However, “4122” and “0.0007” is a pretty big difference, so I guess your issue has sth to do with the fact that you are using different functions!?

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.conv2_drop = nn.Dropout2d()
        self.fc1 = nn.Linear(320, 50)
        self.fc2 = nn.Linear(50, 10)
        self.avg_pool = nn.AvgPool2d(kernel_size=2)
        self.sigmoid = nn.Sigmoid()
        self.log_softmax = nn.LogSoftmax(dim=1)

    def forward(self, x):
        x = self.sigmoid(self.avg_pool(self.conv1(x)))
        x = self.sigmoid(self.avg_pool(self.conv2_drop(self.conv2(x))))
        x = x.view(-1, 320)
        x = F.sigmoid(self.fc1(x))
        x = F.dropout(x, training=self.training)
        x = self.fc2(x)
        return self.log_softmax(x)

The net I used is the above one.

Hm,
have you tried setting your model into model.eval() mode before calling that function? Maybe dropout is responsible.

You’re right.
The result is close to expected one after model.eval().
But it is sensitive to the h.