How can the precision of a GPU influence the regression accuracy?

ChangGao · October 12, 2018, 2:46am

Hi, I am using PyTorch to do a regression task.
Mainly I am using the neural network to approximate some values given some inputs.
But I found that the order of magnitude of approximation accuracy is about 10^-5.
For a deeper neural network this stays the same.
I am wondering if this could be the result of training with single-precision floating-points?

ptrblck · October 12, 2018, 3:10am

This sounds like a float precision issue.
You’ll get the same for different orders of operator execution:

torch.set_printoptions(precision=10)
x = torch.randn(100, 10, 100)
x1 = x.sum()
x2 = x.sum(0).sum(0).sum(0)
print(torch.abs(x1-x2))
> tensor(0.0001220703)

ChangGao · October 12, 2018, 7:34am

Thanks a lot!
Yeah, I found that with “precision=16”, the error is still in 10^-5 order.
Is there anything I can do to make this more accurate?

ptrblck · October 13, 2018, 2:17am

You could try to use double instead of float, but your GPU performance will drop.

ChangGao · October 14, 2018, 1:47am

Thanks you! I am trying to train the network with double now. The training turned slower and harder. The loss was decreased, but it seemed that the loss still would not get lower while in the magnitude of 10^-6. Maybe the learning rate needs to be tuned more carefully.