I thought that for same input, we should receive same output, no matter the batch size.
But I tried a small experiment, and it’s not confirming my intuition…
I define a simple network :
import torch
import torch.nn as nn
smol = nn.Sequential(
nn.Linear(256, 256),
nn.ReLU(),
nn.Linear(256, 256),
nn.ReLU()
)
Then I try 2 inputs : a batch with only sample x, and a batch with sample x and y. I expect the output of sample x to be the same in both case, but it’s not :
x = torch.rand([1, 256]) # Batch with only x
xy = torch.cat([x, torch.rand([1, 256])], dim=0) # Batch with x and another sample
out_alone = smol(x)[0]
out_batch = smol(xy)[0]
assert torch.equal(out_alone, out_batch), f"\n{out_alone[:15]}\n{out_alone[:15]}"
AssertionError:
tensor([0.0870, 0.0757, 0.0076, 0.0000, 0.0000, 0.0000, 0.0032, 0.0976, 0.0000,
0.0648, 0.2508, 0.0000, 0.1737, 0.0546, 0.0043],
grad_fn=)
tensor([0.0870, 0.0757, 0.0076, 0.0000, 0.0000, 0.0000, 0.0032, 0.0976, 0.0000,
0.0648, 0.2508, 0.0000, 0.1737, 0.0546, 0.0043],
grad_fn=)
Colab notebook to reproduce it
Even if it seems to be only a precision difference, why is the output not exactly the same ?
Why the output is dependent on the batch size, when the input is same ?