Hi there!
I am working on implementing my own versions of SGD, but when I checked if the losses match with every iteration, they seem to not. Could anyone please help? Thanks in advance.
The code used is below:
import torch as t
from torch.autograd import Variable as V
from copy import deepcopy
x = V(t.randn(100, 4))
y = V(t.randn(100))
model_1 = t.nn.Sequential(t.nn.Linear(4, 8), t.nn.Linear(8, 4), t.nn.Linear(4, 2), t.nn.Linear(2, 1))
model_2 = deepcopy(model_1)
loss_1 = t.nn.MSELoss()
loss_2 = deepcopy(loss_1)
opt = t.optim.SGD(model_2.parameters(), lr=0.001)
for i in range(0, 10):
print('Not using OPTIM: %f\tUsing OPTIM: %f' % (loss_1(model_1(x), y).data[0], loss_2(model_2(x), y).data[0]))
loss_1.zero_grad()
loss_1(model_1(x), y).backward()
for param in model_1.parameters():
param.data = param.data - 0.001*param.grad.data
opt.zero_grad()
loss_2(model_2(x), y).backward()
opt.step()
The output that I got in one such instance is as follows:
Not using OPTIM: 1.185485 Using OPTIM: 1.185485
Not using OPTIM: 1.183839 Using OPTIM: 1.183839
Not using OPTIM: 1.180592 Using OPTIM: 1.182216
Not using OPTIM: 1.175832 Using OPTIM: 1.180614
Not using OPTIM: 1.169687 Using OPTIM: 1.179034
Not using OPTIM: 1.162323 Using OPTIM: 1.177475
Not using OPTIM: 1.153939 Using OPTIM: 1.175938
Not using OPTIM: 1.144764 Using OPTIM: 1.174421
Not using OPTIM: 1.135046 Using OPTIM: 1.172924
Not using OPTIM: 1.125052 Using OPTIM: 1.171448