# Why the output is different when batching?

I thought that for same input, we should receive same output, no matter the batch size.

But I tried a small experiment, and it’s not confirming my intuition…

I define a simple network :

``````import torch
import torch.nn as nn

smol = nn.Sequential(
nn.Linear(256, 256),
nn.ReLU(),
nn.Linear(256, 256),
nn.ReLU()
)
``````

Then I try 2 inputs : a batch with only sample x, and a batch with sample x and y. I expect the output of sample x to be the same in both case, but it’s not :

``````x = torch.rand([1,  256])                           # Batch with only x
xy = torch.cat([x, torch.rand([1,  256])], dim=0)   # Batch with x and another sample

out_alone = smol(x)
out_batch = smol(xy)

assert torch.equal(out_alone, out_batch), f"\n{out_alone[:15]}\n{out_alone[:15]}"
``````

AssertionError:
tensor([0.0870, 0.0757, 0.0076, 0.0000, 0.0000, 0.0000, 0.0032, 0.0976, 0.0000,
0.0648, 0.2508, 0.0000, 0.1737, 0.0546, 0.0043],
tensor([0.0870, 0.0757, 0.0076, 0.0000, 0.0000, 0.0000, 0.0032, 0.0976, 0.0000,
0.0648, 0.2508, 0.0000, 0.1737, 0.0546, 0.0043],

Colab notebook to reproduce it

Even if it seems to be only a precision difference, why is the output not exactly the same ?

Why the output is dependent on the batch size, when the input is same ?

Different algorithms could be used for different input shapes, which would thus create the small errors due to the limited numerical precision.
E.g. a different order of operations would also create these numerical differences as seen here:

``````x = torch.randn(10, 10, 10)
s1 = x.sum()
s2 = x.sum(0).sum(0).sum(0)
print((s1 - s2).abs().max())
> tensor(9.5367e-07)
``````

and you cannot assume to get bitwise-accurate results using different algorithms.

1 Like