Loss trend different from Caffe

I had trained C3D in Caffe, and now I am trying to do the same in Pytorch. But I am not seeing the same loss trend in Pytorch as obtained in Caffe. I am using SGD solver and I had read somewhere that the SGD implementation in Pytorch is a bit different. Loss was decreasing at a faster rate in Caffe. In Pytorch it is not coming down as quickly as in Caffe. Is this expected behaviour?