Hi, I trained a model, and want to use it to predict in a eval and no_grad model. Strangely, I got different results if I feed the same data to the model twice. Here is the code snipper:
for data in generator:
left = data['left']
right = data['right']
print('is training': self.model.training)
data1 = self.model(left, right)
data2 = self.model(left, right)
data2 = self.predict(data2)
print(abs((data1-data2) >= 0.00001).double().sum())
and the output is like:
is training: False
tensor(10878., device=‘cuda:0’, dtype=torch.float64)
is there any idea to debug the problem ?
Small numerical mismatches are expected due to the limited numerical precision.
In your code it also seems the
abs should be applied on the subtraction, not the comparison, so you might want to fix it.
Thank you for your time.
Yes, you are right, the abs should be applied on the subtraction.
I traced the code and find that the difference came from the ConvTransposed2d module:
x1 = copy.deepcopy(x)
x1 = self.conv(x1)
x = self.conv(x)
self.conv = nn.ConvTranspose2d(in_channels, out_channels, bias=False, **kwargs)
and the output looks like:
baseicconv: tensor(158545, device=‘cuda:0’)
If, as you said, the small numerical mismatches are expected, it should be always non-zero. However, as far as I know, the difference happens occasionaly:
PS: I found when set the batch-size to 1, there is no such a problem.