Hi all, I am trying to compute the pseudoinverse (Moore-Penrose inverse) of a matrix using tensor.pinverse() with mixed-precision mode. But I got this error:
RuntimeError: linalg_pinv(Half{[10, 10]}): expected a tensor with 2 or more dimensions of float, double, cfloat or cdouble types
Could someone please guide how can I fix this problem.
Thanks in advance
I guess we donât have this implemented atm. Such functions are usually pretty sensitive to numerical precision so might be very unstable with fp16 inputs.
Would it work for you to convert the input to fp32 before calling pinv? If not could you open an issue on github asking for support on that function?
Hi @albanD, thanks for your advice. I used this block for my training
with torch.autocast(device_type='cuda', dtype=torch.float16, enabled=True):
my code...
when tensor.pinverse() was used the tensor was converted to fp32 but I got this error during backward
self.scaler.scale(loss).backward()
File â/usr/local/lib/python3.9/dist-packages/torch/_tensor.pyâ, line 307, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File â/usr/local/lib/python3.9/dist-packages/torch/autograd/init.pyâ, line 154, in backward
Variable._execution_engine.run_backward(
RuntimeError: Function âEigBackward0â returned nan values in its 0th output.
Iâm afraid this is exactly the kind of issues you would get from what I was saying above about these functions being very sensitive to numerical precision.
I would first check that it runs fine without autocast in fp32 precision. And if it does, you might have to only autocast other pieces of your network and not this one.
I need to compute pseudoinverse of matrix in fp16 mode because I have a limited GPU to run my network so when I use fp32 I get cuda out of memory error.
Linear algebra operations are often sensitive to the numerical precision (which is why e.g. TF32 is not used in torch.lingalg) and the failure might thus be expected. If you donât have enough device memory you might need to move the tensor to the CPU instead.
Your process is most likely also killed by the OS as itâs running out of host RAM so your system also doesnât seem to have enough RAM to perform the operation on the CPU.
pinv is quite literally the most unstable operation in the codebase. In almost all scenarios you donât want to compute the pseudoinverse directly, but rather you want to compute lstsq (although we donât have support for non-full-rank matrices on GPU) or solve.
If you really need to use pinv, there is no way itâs going to compute accurate results in half precision. It will already struggle in float32. In this case, I would suggest either making your model smaller or getting an instance with a GPU wtih x2 . Now, as Piotr mentioned, if you canât run your modle on CPU either, it looks like there are some fundamental issues with your model, as it may be using much more memory than you think it is. If this is the case, running it in half precision will not solve these issues.