Hello All,

This is my first post and I am sorry if either I posting in a wrong place or this post is too trivial. But I can’t really figure out what is going wrong here. I will come to the point. I have a tensor looking like this

```
X = torch.ones((6, 8))
X[:, 2:6] = 0
X
tensor([[1., 1., 0., 0., 0., 0., 1., 1.],
[1., 1., 0., 0., 0., 0., 1., 1.],
[1., 1., 0., 0., 0., 0., 1., 1.],
[1., 1., 0., 0., 0., 0., 1., 1.],
[1., 1., 0., 0., 0., 0., 1., 1.],
[1., 1., 0., 0., 0., 0., 1., 1.]])
```

And I have a Kernel looking like this (Simple vertical edge detection kernel)

`K = torch.tensor([[1, -1]])`

When I apply this kernel on the tensor I get the desired result

```
Y
tensor([[ 0., 1., 0., 0., 0., -1., 0.],
[ 0., 1., 0., 0., 0., -1., 0.],
[ 0., 1., 0., 0., 0., -1., 0.],
[ 0., 1., 0., 0., 0., -1., 0.],
[ 0., 1., 0., 0., 0., -1., 0.],
[ 0., 1., 0., 0., 0., -1., 0.]])
```

Now I want to learn this kernel. Here is what my code looks like

```
conv = nn.Conv2d(1, 1, kernel_size=(1, 2), bias=False)
X = X.reshape(1, 1, 6, 8)
Y = Y.reshape(1, 1, 6, 7)
for i in range(10):
Y_hat = conv(X)
loss = (Y_hat - Y) ** 2
conv.zero_grad()
loss.sum().backward()
conv.weight.data[:] -= 3e-2 * conv.weight.grad
if i % 2 == 0:
print(f"Loss at {i} epoch is {loss.sum()}")
```

When I run this algorithm for 10 epochs (as shown in the code above), and then print out the weights of the kernel I get this

```
Parameter containing:
tensor([[[[ 0.9466, -1.0235]]]], requires_grad=True)
```

Which is very close to the actual values.

However, when I change the loss calculation my nn.MSELoss Like so -

```
...
loss = nn.MSELoss()(Y_hat, Y)
...
loss.backward()
```

Training for only 10 epochs does not approximate at all (something like this - `tensor([[[[-0.0427, 0.2747]]]], requires_grad=True)`

)

And when I train it for much longer (let’s say 150 epochs or so) then it makes a bit better approximation

Like so - `tensor([[[[ 0.7467, -0.7456]]]], requires_grad=True)`

I am curious to know why this huge difference of epochs are there just because I changed the loss calculation slightly.

Thanks for pointing me to any good resources.

Regards,

Shubhadeep