Porting code from Torch to Pytorch

I’m trying to port some 3rd-party code from Torch to PyTorch, but for some reason I’m getting slower convergence in PyTorch. In both cases, it’s just doing vanilla SGD without momentum on a supervised classification task.

Are there any difference between the default implementations of SGD in Torch vs Pytorch?

This is the loss function for the Torch code. Is this equivalent to Pytorch’s cross-entropy loss?

function classes_learner:learn(buffer)
  local obs, classes = buffer.obs, buffer.classes
  local output = self.net:forward(obs):float():add(self.epsilon)
  local log_output = torch.log(output)
  local neg_err = 0
  local batch_size = classes:size(1)
  for i=1,batch_size do
    neg_err = neg_err + log_output[i][classes[i]]
    self.output_err[i][classes[i]] = -1 / output[i][classes[i]]
  neg_err = neg_err / batch_size
  self.net:backward(obs, self.output_err)
  self.optim_err = self.optim_err - neg_err