The precision problem during Back-Propagation

skhu1 · December 27, 2019, 7:13am

May I ask why the result of \partial L/\partial z_1 \partial L/\partial z_2 is not strictly equal?

import torch
import torch.nn as nn
input = torch.rand(3,3,256,256)
label = torch.LongTensor([1,2,3])

relu = nn.ReLU(inplace=False)
Conv = nn.Conv2d(3, 16, 3, padding=0, bias=False)
bn = nn.BatchNorm2d(16, eps=1e-2)
global_pooling = nn.AdaptiveAvgPool2d(1)
classifier = nn.Linear(16, 5,bias=False)
criterion = nn.CrossEntropyLoss()

z = torch.FloatTensor([1,1]).requires_grad_()
out = relu(input)
out = Conv(out)
out = bn(out)
out = global_pooling(out)
logits = classifier(out.view(out.size(0), -1)*z[0])*z[1]

error_loss = criterion(logits, label)
error_loss.backward()

torch.set_printoptions(precision=10)
print(z.grad)
torch.set_printoptions(precision=20)
print(z.grad)

skhu1 · December 27, 2019, 7:16am

If we have f(x)=Wx, then we can derive the formula \partial L / \partial x * x = \partial L / \partial f(x) * f(x) = \partial L / \partial z_1 = \partial L / \partial z_2 based on the chain rule.

skhu1 · December 27, 2019, 7:19am

I have the following result, i.e., tensor([0.00013380237214732915, 0.00013380234304349869]). And I can not understand this.

albanD · December 27, 2019, 9:28am

Hi,

I did not check your code in detail, but given your last message, I think the answer is simply that single precision float numbers are only precise up to 6/7 digits. And there two numbers you show are precise up to that precision. So they are equal in terms of float numbers.
You can use double precision numbers if you want more precise values.

skhu1 · December 28, 2019, 2:25am

Thanks, I will check about this!