Non-reproducible results during evaluation

octopusyo · October 30, 2017, 5:24pm

Hi,

I have trained a bidirectional GRU and during evaluation (when I load the weights from the checkpoint) I receive every time different results even though the input is exactly same and I run on cpu. Dropout is removed, eval mode is set. I checked everything as it seems, but I don’t know where to look already.

SpandanMadan · October 30, 2017, 6:35pm

One silly reason (which has happened to me) could be because of the transforms in your loader. If the loader has shuffle = True, or something like RandomHorizontalCrop() then the input to the network is changing every time you run your code, so naturally, the output changes slightly every time.

octopusyo · October 30, 2017, 7:36pm

Thx, but the input is not processed. I get different results running GRU over the same input each time I run the script

richard · October 30, 2017, 9:49pm

Not sure if this’ll help at all, but what happens when you manually set the rng seed to something like 0? http://pytorch.org/docs/master/torch.html#torch.manual_seed

octopusyo · October 30, 2017, 10:16pm

This should not matter for the evaluation, right? Especially on cpu. As there is not randomness (dropout, cudnn )
Though I tried to fix numpy and torch random seeds and still there is a variation during GRU forward pass call.

SpandanMadan · October 31, 2017, 1:38am

By not preprocessed you mean you have no transforms at all?

ozancaglayan · October 31, 2017, 5:56pm

Can you post a minimally working example? Usage of dict at some point in your code may cause this kind of problem also.