[GRU] Different results between batched and one by one inference


I’m using Pytorch 1.7.1 on CPU and I’m getting inconsistent results during inference over the same data.
It seems that the GRU implementation gives slightly different results for a sample by sample prediction vs batched prediction.

Here is a code to reproduce the problem:

import torch
a = torch.randn((128, 500, 4))
layer = torch.nn.GRU(4, 4, num_layers=2, dropout=0.0, bidirectional=True, batch_first=True)
batched, _ = layer(a)
obo = torch.zeros(batched.shape)
for i in range(128):
 obo[i] = layer(a[[i]])[0][0]

batched == obo

I was expecting this code to return only True but instead returns a mix of True and False. Is it normal or is it a pytorch problem ?

Thank you for your time :slight_smile:

If the relative error is in the range ~1e-6 the difference is most likely due to the limited precision used in float32 and you should get a lower error using e.g. flaot64.
The better way to compare floating point numbers is to use torch.allclose, which uses a small eps value to compare both tensors.

Thank you for the answer !

Error seems to be in the 1e-6 range right after the RNN layers indeed.
Are you aware of any way to attenuate this behaviour ? In my actual code, after a few layers, this error is more in the 1e-4 range, which causes non reproducibility problems during inference

Are you seeing non deterministic behavior using one approach and following the reproducibility docs or just if you compare the batched vs. single sample approach?

I have already tried changing the seed and setting pytorch in deterministic mode to no avail.
I find a dirty hack: using quantization aware training seems to attenuate the negative impact of this.

hello, I suffered the same problem.
In my case, after 28 layers, this error is more in 1e-2 range.
So, how did you finally solve it? Thank you very much.