[nonissue] Autograd fails when using half-precision - overflow on matrix size


(Dimitry Pletnikov) #1

I got Titan V and have been experimenting with half-precision.

In half-precision mode I can’t backpropagate through matmul of two all-zeros matrices, because the number of elements in the resulting matrix is outside of half-precision range.

I am getting the same error if I use Conv1d or Conv2d or bmm.

This minimal computation graph replicates the problem:

import torch, torch.autograd, torch.nn,numpy
with torch.cuda.device(0):

    test_input = torch.autograd.Variable(torch.zeros(257, 509)).cuda().half()
    test_w = torch.nn.Parameter(torch.zeros(509,263)).cuda().half()

    matmul_result = torch.matmul(test_input, test_w)
    print(matmul_result.size())
    print(numpy.prod(matmul_result.size()))

    test_output = matmul_result.abs().mean()
    test_output.backward()

And the result is:

torch.Size([257, 263])
67591
Traceback (most recent call last):
  File "<stdin>", line 11, in <module>
  File "/home/dzmitry/miniconda3/lib/python3.6/site-packages/torch/autograd/variable.py", line 167, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
  File "/home/dzmitry/miniconda3/lib/python3.6/site-packages/torch/autograd/__init__.py", line 99, in backward
    variables, grad_variables, retain_graph)
RuntimeError: value cannot be converted to type Half without overflow: 67591

I am using PyTorch 0.3 and could replicate the issue with CUDA 8 and CUDA 9.


(Dimitry Pletnikov) #2

I found my mistake. It actually fails because I use .mean(), which of course has the total number of elements in it.


(Tensor8) #3

@dimitry12 so is the recommended workaround to just replace with .sum() ?