Conflict between model.eval() and .train() with multiprocess training and evaluation

Turns out this is related to : Performance highly degraded when eval() is activated in the test phase

It’s a bug in pytorch’s definition of batchnorm according to those guys : https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/batchnorm.py

Their solution only partially solved my discrepancy.

2 Likes