Sparse tensor is not supported by aten::ne

rrkarim · May 5, 2020, 9:22am

How can I find the operation in the code which is using aten::ne op in backward that doesn’t support sparse tensors? Is there any method for debugging the backward?

I have an untrackable exception when I’m trying to backpropogate gradients:
RuntimeError: Could not run 'aten::ne.Tensor' with arguments from the 'SparseCUDATensorId' backend. 'aten::ne.Tensor' is only available for these backends: [CUDATensorId, QuantizedCPUTensorId, CPUTensorId, VariableTensorId].

Maybe there is a list of python torch ops that have aten:ne in the backward c++ implementation?

Unfortunately, I can’t generate a simple code snippet for testing, because the issue is not about the support of sparse tensor in aten::ne but about the python op that contains the usage of it. Still, the code that I’m trying to work with is the pytorch implementation of nerf: yenchenlin/nerf-pytorch. I’m just inserting sparse nn.Embedding in module list on the line here/line#83.

albanD · May 5, 2020, 2:10pm

Hi,

You can try to enable anomaly mode: https://pytorch.org/docs/stable/autograd.html#torch.autograd.detect_anomaly

This will show you which forward function is causing the error in the backward.

rrkarim · May 5, 2020, 3:13pm

detect_anomaly doesn’t help here, stack trace stop at the sparse embedding forward fn call. It doesn’t provide any info, since I already know that the problem is caused by sparse tensors.

albanD · May 5, 2020, 3:20pm

When you say sparse embedding, you mean the nn.Embedding with sparse = True?
If so, that means that it generates a sparse gradient. So the op just before needs to be able to handle such gradient properly.

rrkarim · May 5, 2020, 3:23pm

Yeah, but the op before is the backward for nn.BatchNorm1d, which should support sparse gradients. Ok, I guess I need to dig a little more then. Thanks so much. Will write here, if something interesting is found.

rrkarim · May 5, 2020, 3:30pm

Are there any difficulties for supporting sparse tensors in aten:ne? Maybe I can open PR on it later if not?

albanD · May 5, 2020, 3:34pm

Hi,

I don’t think there is any specific reason. And we would be happy to accept PR to increase sparse support

But I would say that many functions don’t support sparse Tensors, so you might get into more trouble adter added aten::ne, you might want to check before.

rrkarim · May 5, 2020, 4:50pm

So there was hidden set_detect_anomaly flag in the code. That was the issue.

albanD · May 5, 2020, 5:45pm

Note that if you use the latest version, this should have been fixed: https://github.com/pytorch/pytorch/issues/28649

qiuli380 · August 27, 2020, 8:32am

Hi,I have the same problem when using the GAT framework:RuntimeError: Could not run ‘aten::gt.Scalar’ with arguments from the ‘SparseCPUTensorId’ backend. ‘aten::gt.Scalar’ is only available for these backends: [CPUTensorId, QuantizedCPUTensorId, VariableTensorId]. can you tell me how to solve this problem

qiuli380 · August 27, 2020, 8:53am

Hi，what does set_detect_anomaly means，What’s in the code，how can i solve it？

albanD · August 27, 2020, 2:11pm

It is used to find nan and other issues in the backward pass.
It was using specific functions to do that detection that were not supported by sparse Tensors hence the issue.
But your error is not related as we don’t use gt() in there.

This error just means that this function is not implemented for sparse Tensors and you cannot use it.