I’m trying to port some 3rd-party code from Torch to PyTorch, but for some reason I’m getting slower convergence in PyTorch. In both cases, it’s just doing vanilla SGD without momentum on a supervised classification task.
Are there any difference between the default implementations of SGD in Torch vs Pytorch?
This is the loss function for the Torch code. Is this equivalent to Pytorch’s cross-entropy loss?
function classes_learner:learn(buffer)
local obs, classes = buffer.obs, buffer.classes
local output = self.net:forward(obs):float():add(self.epsilon)
local log_output = torch.log(output)
local neg_err = 0
self.output_err:resizeAs(output):zero()
local batch_size = classes:size(1)
for i=1,batch_size do
neg_err = neg_err + log_output[i][classes[i]]
self.output_err[i][classes[i]] = -1 / output[i][classes[i]]
end
self.output_err:div(batch_size)
neg_err = neg_err / batch_size
self.net:backward(obs, self.output_err)
end
self.optim_err = self.optim_err - neg_err
end