Compare correct gradients of PyTorch with own Implementation

Hi there!

I am trying to build some PyTorch functions, like convolution 2D, from scratch using numpy.
To achieve this, I need to check if my current implementations are correct, before I move on to the next implementation. I got the forward pass correct, however, for the backward pass it is not so simple to compare the output of PyTorch with mine.

Basically, what I am trying to compare are the gradients of the loss w.r.t to the weighst of the convolution (without bias for now). Let me show you how I wanted to achieve this:

# create some random image for the input for the convolution and one for calculating the loss
input_numpy = np.random.uniform(size=(3, 3, 64, 64))
input_tensor = torch.from_numpy(input_numpy).float()
y_true = torch.randn(size=(3, 3, 30, 30))

# init the pytorch model and transfer the weights to 'my' model
torch_conv2d = nn.Conv2d(3, 3, 6, 2, 0, 1, bias=False)
my_conv2d = Conv2D(3, 3, 6, 2, 0, 1, bias=False) =

# calculate the forward pass for both models, those are equal (I checked that already)
torch_pred = torch_conv2d(input_tensor)
my_pred = conv2d.forward(input_numpy)

Now, to make my backpass function work, I need the gradient of the loss w.r.t to the predictions.
If I understand it correctly, the loss function (as an example)

criterion = nn.MSELoss()

Does not return the gradients. I can only get the gradients of the weights by calling

loss = criterion(torch_pred, y_true)

I tried getting the gradients using autograd:

from torch.autograd import grad as autograd

y_hat = torch.autograd.Variable(pred_torch.detach(), requires_grad=True)
grad = autograd(((y_hat - y_true) ** 2).sum(), y_hat)

Am I correct, that the first item of grad (grad[0]), are the gradients of the MSE Loss w.r.t to the prediction, and thus is the required input for my backward pass?

This line of code:

explicitly detaches the outpute tensor and recreates a new deprecated Variable.
I would assume you want to check the weight and data gradients as seen here:

conv = nn.Conv2d(3, 3, 3)
x = torch.randn(1, 3, 24, 24, requires_grad=True)
out = conv(x)

# grad w.r.t. weight
wgrad = torch.autograd.grad(out, conv.weight, torch.ones_like(out), retain_graph=True)
# grad w.r.t. input
dgrad = torch.autograd.grad(out, x, torch.ones_like(out))