Hello All,
This is my first post and I am sorry if either I posting in a wrong place or this post is too trivial. But I can’t really figure out what is going wrong here. I will come to the point. I have a tensor looking like this
X = torch.ones((6, 8))
X[:, 2:6] = 0
X
tensor([[1., 1., 0., 0., 0., 0., 1., 1.],
[1., 1., 0., 0., 0., 0., 1., 1.],
[1., 1., 0., 0., 0., 0., 1., 1.],
[1., 1., 0., 0., 0., 0., 1., 1.],
[1., 1., 0., 0., 0., 0., 1., 1.],
[1., 1., 0., 0., 0., 0., 1., 1.]])
And I have a Kernel looking like this (Simple vertical edge detection kernel)
K = torch.tensor([[1, -1]])
When I apply this kernel on the tensor I get the desired result
Y
tensor([[ 0., 1., 0., 0., 0., -1., 0.],
[ 0., 1., 0., 0., 0., -1., 0.],
[ 0., 1., 0., 0., 0., -1., 0.],
[ 0., 1., 0., 0., 0., -1., 0.],
[ 0., 1., 0., 0., 0., -1., 0.],
[ 0., 1., 0., 0., 0., -1., 0.]])
Now I want to learn this kernel. Here is what my code looks like
conv = nn.Conv2d(1, 1, kernel_size=(1, 2), bias=False)
X = X.reshape(1, 1, 6, 8)
Y = Y.reshape(1, 1, 6, 7)
for i in range(10):
Y_hat = conv(X)
loss = (Y_hat - Y) ** 2
conv.zero_grad()
loss.sum().backward()
conv.weight.data[:] -= 3e-2 * conv.weight.grad
if i % 2 == 0:
print(f"Loss at {i} epoch is {loss.sum()}")
When I run this algorithm for 10 epochs (as shown in the code above), and then print out the weights of the kernel I get this
Parameter containing:
tensor([[[[ 0.9466, -1.0235]]]], requires_grad=True)
Which is very close to the actual values.
However, when I change the loss calculation my nn.MSELoss Like so -
...
loss = nn.MSELoss()(Y_hat, Y)
...
loss.backward()
Training for only 10 epochs does not approximate at all (something like this - tensor([[[[-0.0427, 0.2747]]]], requires_grad=True)
)
And when I train it for much longer (let’s say 150 epochs or so) then it makes a bit better approximation
Like so - tensor([[[[ 0.7467, -0.7456]]]], requires_grad=True)
I am curious to know why this huge difference of epochs are there just because I changed the loss calculation slightly.
Thanks for pointing me to any good resources.
Regards,
Shubhadeep