How can I find the operation in the code which is using aten::ne op in backward that doesn’t support sparse tensors? Is there any method for debugging the backward?
I have an untrackable exception when I’m trying to backpropogate gradients:
RuntimeError: Could not run 'aten::ne.Tensor' with arguments from the 'SparseCUDATensorId' backend. 'aten::ne.Tensor' is only available for these backends: [CUDATensorId, QuantizedCPUTensorId, CPUTensorId, VariableTensorId].
Maybe there is a list of python torch ops that have
aten:ne in the backward c++ implementation?
Unfortunately, I can’t generate a simple code snippet for testing, because the issue is not about the support of sparse tensor in
aten::ne but about the python op that contains the usage of it. Still, the code that I’m trying to work with is the pytorch implementation of nerf: yenchenlin/nerf-pytorch. I’m just inserting sparse
nn.Embedding in module list on the line here/line#83.
You can try to enable anomaly mode: https://pytorch.org/docs/stable/autograd.html#torch.autograd.detect_anomaly
This will show you which forward function is causing the error in the backward.
detect_anomaly doesn’t help here, stack trace stop at the sparse embedding forward fn call. It doesn’t provide any info, since I already know that the problem is caused by sparse tensors.
When you say sparse embedding, you mean the nn.Embedding with sparse = True?
If so, that means that it generates a sparse gradient. So the op just before needs to be able to handle such gradient properly.
Yeah, but the op before is the backward for
nn.BatchNorm1d, which should support sparse gradients. Ok, I guess I need to dig a little more then. Thanks so much. Will write here, if something interesting is found.
Are there any difficulties for supporting sparse tensors in
aten:ne? Maybe I can open PR on it later if not?
I don’t think there is any specific reason. And we would be happy to accept PR to increase sparse support
But I would say that many functions don’t support sparse Tensors, so you might get into more trouble adter added
aten::ne, you might want to check before.
So there was hidden
set_detect_anomaly flag in the code. That was the issue.
Note that if you use the latest version, this should have been fixed: https://github.com/pytorch/pytorch/issues/28649
Hi,I have the same problem when using the GAT framework:RuntimeError: Could not run ‘aten::gt.Scalar’ with arguments from the ‘SparseCPUTensorId’ backend. ‘aten::gt.Scalar’ is only available for these backends: [CPUTensorId, QuantizedCPUTensorId, VariableTensorId]. can you tell me how to solve this problem
Hi，what does set_detect_anomaly means，What’s in the code，how can i solve it？
It is used to find nan and other issues in the backward pass.
It was using specific functions to do that detection that were not supported by sparse Tensors hence the issue.
But your error is not related as we don’t use gt() in there.
This error just means that this function is not implemented for sparse Tensors and you cannot use it.