Different phenomena on Pytorch and Caffe on a same task

I trained a ResNet-20 model on cifar-10 dataset with PyTorch and Caffe respectively, and got top1-accuracy around 91.5% for both. However, quite different phenomena is observed, especially during the earliest epochs. On caffe,the earliest epochs are featured with relatively larger testing loss(e.g, several times larger than the training loss). On PyTorch, testing loss and training loss are close all the time. To my knowledge, the phenomena on caffe is rational, because during inference, the Batch Normalization layers’ parameters use a moving average( or so-called global stats), which is not stable and accuracy enough during the earliest epochs. Is it a potential bug that PyTorch forgets to use moving average for inference? or some other reasons?

Do you mean model.eval()?
The dropout and BatchNorm layers have different behaviors during training and inferencing period.

I did use model.eval() during inference, which is supposed to make the BN layers using the global stats.

AFAIK, there are not global stats in inference period in PyTorch, It only uses running mean and running var.

So, is it due to the different setting of mean and var during inference for PyTorch and Caffe?

According to “https://pytorch.org/docs/stable/_modules/torch/nn/modules/batchnorm.html”, it seems that the parameter “track_running_stats” is something similar to “use_global_stats” in caffe. When set to false, it amounts to say that “use_global_stats = false” in both train and test Phase.

However, I think the behavior of batch norm in pytorch and caffe still has some minor differences.

What I would do is to convert pytorch model to caffemodel. Python seems to be hard to use for me.