The precision problem during Back-Propagation

May I ask why the result of \partial L/\partial z_1 \partial L/\partial z_2 is not strictly equal?

import torch
import torch.nn as nn
input = torch.rand(3,3,256,256)
label = torch.LongTensor([1,2,3])

relu = nn.ReLU(inplace=False)
Conv = nn.Conv2d(3, 16, 3, padding=0, bias=False)
bn = nn.BatchNorm2d(16, eps=1e-2)
global_pooling = nn.AdaptiveAvgPool2d(1)
classifier = nn.Linear(16, 5,bias=False)
criterion = nn.CrossEntropyLoss()

z = torch.FloatTensor([1,1]).requires_grad_()
out = relu(input)
out = Conv(out)
out = bn(out)
out = global_pooling(out)
logits = classifier(out.view(out.size(0), -1)*z[0])*z[1]

error_loss = criterion(logits, label)
error_loss.backward()

torch.set_printoptions(precision=10)
print(z.grad)
torch.set_printoptions(precision=20)
print(z.grad)

If we have f(x)=Wx, then we can derive the formula \partial L / \partial x * x = \partial L / \partial f(x) * f(x) = \partial L / \partial z_1 = \partial L / \partial z_2 based on the chain rule.

I have the following result, i.e., tensor([0.00013380237214732915, 0.00013380234304349869]). And I can not understand this.

Hi,

I did not check your code in detail, but given your last message, I think the answer is simply that single precision float numbers are only precise up to 6/7 digits. And there two numbers you show are precise up to that precision. So they are equal in terms of float numbers.
You can use double precision numbers if you want more precise values.

1 Like

Thanks, I will check about this!