[GRU] Different results between batched and one by one inference

docteur_saoul · January 28, 2021, 2:36pm

Hi,

I’m using Pytorch 1.7.1 on CPU and I’m getting inconsistent results during inference over the same data.
It seems that the GRU implementation gives slightly different results for a sample by sample prediction vs batched prediction.

Here is a code to reproduce the problem:
import torch a = torch.randn((128, 500, 4)) layer = torch.nn.GRU(4, 4, num_layers=2, dropout=0.0, bidirectional=True, batch_first=True) layer.eval() batched, _ = layer(a) obo = torch.zeros(batched.shape) for i in range(128): obo[i] = layer(a[[i]])[0][0]

batched == obo

I was expecting this code to return only True but instead returns a mix of True and False. Is it normal or is it a pytorch problem ?

Thank you for your time

ptrblck · January 30, 2021, 8:31am

If the relative error is in the range ~1e-6 the difference is most likely due to the limited precision used in float32 and you should get a lower error using e.g. flaot64.
The better way to compare floating point numbers is to use torch.allclose, which uses a small eps value to compare both tensors.

docteur_saoul · January 30, 2021, 12:04pm

Thank you for the answer !

Error seems to be in the 1e-6 range right after the RNN layers indeed.
Are you aware of any way to attenuate this behaviour ? In my actual code, after a few layers, this error is more in the 1e-4 range, which causes non reproducibility problems during inference

ptrblck · January 31, 2021, 7:18am

Are you seeing non deterministic behavior using one approach and following the reproducibility docs or just if you compare the batched vs. single sample approach?

docteur_saoul · February 1, 2021, 7:28am

I have already tried changing the seed and setting pytorch in deterministic mode to no avail.
I find a dirty hack: using quantization aware training seems to attenuate the negative impact of this.

qingqingxu2020 · June 13, 2022, 9:23am

hello, I suffered the same problem.
In my case, after 28 layers, this error is more in 1e-2 range.
So, how did you finally solve it? Thank you very much.