Not sure, but maybe it has sth to do with using net(target_img) vs forward_target(net, imgcat, target). There must be some issue in your code; I compared numerical to manual to autograd gradients many times in the past for complex neural nets and never found a discrepancy in PyTorch.
I would make your delta h much smaller though. E.g., when I use 1e-10 in this super simplistic example, I get the exact same results
import torch
x = torch.tensor([0.5], requires_grad=True, dtype=torch.float64)
out = torch.sigmoid(x)
out.backward()
x.grad
but at 1e-3 the numerical gradient falls apart (10% discrepancy in that case). Probably larger issue in deep architectures. However, “4122” and “0.0007” is a pretty big difference, so I guess your issue has sth to do with the fact that you are using different functions!?