Deeplab Large FOV version 2 Trained in Caffe but not on Pytorch

Hi, I am the owner of the repo you have mentioned. There is a performance difference between the pytorch and caffe implementation by about 3.25% of mean IOU(with pytorch implementation being worse, pytorch’s mean IOU is 71.13% and caffe’s is 74.39%). I am not sure why this difference occurs. There are some subtle differences(mentioned here in the readme), but I dont think they would cause this difference.
I am creating a new optimizer after each 10 iterations(iter size is 10 in caffe implementation), which causes the momentum to be lost. To prevent this, I tried this also, but it gave worse results. @smth what do you think might be causing the difference? Thanks!