Hi dear all
I am using pytorch to generate adversarial samples.
When back propagating the error to input pattern elements in .eval() mode, I observe by changing the batch size from 1 to larger, the gradient values varies slightly.
Does anyone have any idea that this is common or I made a mistake.

Batchsize>1
-0.012502569
-0.06345608
0.024056617
0.00014444088
0.030317063
-0.049938474
0.0059496835

Batchsize=1
-0.012502571
-0.06345608
0.024056613
0.00014443748
0.030317055
-0.049938496
0.005949676

For reproducibility, I have used the following commands:

torch.backends.cudnn.deterministic = True
torch.manual_seed(1)
np.random.seed(1)

This is expected. The typical origin of such fluctuations is that larger sums of floating point values are not commutative. Changing the batch size can typically change the layout of how the computational kernels access data, so these things happen and are outside of any “guarantees” of determinism.

Up to my knowledge, the gradient of loss function w.r.t an input pattern in .eval() mode is independent of other input patterns in the mini-batch. In such a way as you mentioned the number of summation would increase by larger batch size but these summations can be categorized into independent calculations each corresponding to an individual input pattern. Thus I expect the calculations result in the same gradient. Do you have any idea about it?

It is crucial that this isn’t about the “theoretical maths” but about numerical precision.

So this:

``````a = torch.randn(5,5)
torch.zeros(10,10)
b[:5,:5] = a
a.sum() - b.sum()
``````

will typically give something like `tensor(-5.3644e-07)`.
Even
`a.sum() - b[:5,:5].sum()`
will usually not give exactly zero.

This is because the PyTorch functions make no guarantee in which order the 25 nonzero elements are summed and the sum actually depends on the order due to limited numerical precision.

Best regards

Thomas

Thanks Thomas for the clear response. I understand.
Best