Precision of network gradient

sajjad · February 6, 2021, 11:56am

Hi dear all
I am using pytorch to generate adversarial samples.
When back propagating the error to input pattern elements in .eval() mode, I observe by changing the batch size from 1 to larger, the gradient values varies slightly.
Does anyone have any idea that this is common or I made a mistake.
two typical gradients are:

Batchsize>1
-0.012502569
-0.06345608
0.024056617
0.00014444088
0.030317063
-0.049938474
0.0059496835

Batchsize=1
-0.012502571
-0.06345608
0.024056613
0.00014443748
0.030317055
-0.049938496
0.005949676

For reproducibility, I have used the following commands:

torch.backends.cudnn.deterministic = True
torch.manual_seed(1)
np.random.seed(1)

tom · February 6, 2021, 2:35pm

This is expected. The typical origin of such fluctuations is that larger sums of floating point values are not commutative. Changing the batch size can typically change the layout of how the computational kernels access data, so these things happen and are outside of any “guarantees” of determinism.

sajjad · February 6, 2021, 3:20pm

Thanks Thomas for your response.
Up to my knowledge, the gradient of loss function w.r.t an input pattern in .eval() mode is independent of other input patterns in the mini-batch. In such a way as you mentioned the number of summation would increase by larger batch size but these summations can be categorized into independent calculations each corresponding to an individual input pattern. Thus I expect the calculations result in the same gradient. Do you have any idea about it?

tom · February 6, 2021, 4:21pm

It is crucial that this isn’t about the “theoretical maths” but about numerical precision.

So this:

a = torch.randn(5,5)
torch.zeros(10,10)
b[:5,:5] = a
a.sum() - b.sum()

will typically give something like tensor(-5.3644e-07).
Even
a.sum() - b[:5,:5].sum()
will usually not give exactly zero.

This is because the PyTorch functions make no guarantee in which order the 25 nonzero elements are summed and the sum actually depends on the order due to limited numerical precision.

Best regards

Thomas

sajjad · February 6, 2021, 6:34pm

Thanks Thomas for the clear response. I understand.
Best
Sajjad