Hi dear all
I am using pytorch to generate adversarial samples.
When back propagating the error to input pattern elements in .eval() mode, I observe by changing the batch size from 1 to larger, the gradient values varies slightly.
Does anyone have any idea that this is common or I made a mistake.
two typical gradients are:

This is expected. The typical origin of such fluctuations is that larger sums of floating point values are not commutative. Changing the batch size can typically change the layout of how the computational kernels access data, so these things happen and are outside of any “guarantees” of determinism.

Thanks Thomas for your response.
Up to my knowledge, the gradient of loss function w.r.t an input pattern in .eval() mode is independent of other input patterns in the mini-batch. In such a way as you mentioned the number of summation would increase by larger batch size but these summations can be categorized into independent calculations each corresponding to an individual input pattern. Thus I expect the calculations result in the same gradient. Do you have any idea about it?

It is crucial that this isn’t about the “theoretical maths” but about numerical precision.

So this:

a = torch.randn(5,5)
torch.zeros(10,10)
b[:5,:5] = a
a.sum() - b.sum()

will typically give something like tensor(-5.3644e-07).
Even a.sum() - b[:5,:5].sum()
will usually not give exactly zero.

This is because the PyTorch functions make no guarantee in which order the 25 nonzero elements are summed and the sum actually depends on the order due to limited numerical precision.