Sparse Tensor implementation (sparsify last dimension only)

richardbaxter · October 29, 2023, 4:15am

I would like to be able to sparsify a single (last) dimension of an N-dimensional dense tensor, as I need to be able to manipulate the sparse tensor using ordinary pytorch operations. An example of an ordinary operation I need to be able to perform to the tensor is normaliseTensor (min/max are unavailable);

def normaliseTensor(tensor):
    min_vals, _ = torch.min(tensor, dim=-1, keepdim=True)
    max_vals, _ = torch.max(tensor, dim=-1, keepdim=True)
    epsilon = 1e-8  # Small epsilon value
    tensor = (tensor - min_vals) / (max_vals - min_vals + epsilon)
    return tensor

min(dim)/max(dim) could be easily implemented using a hybrid dense/sparse tensor (where only the last dimension is sparsified).

Ideally the user should be able to select a dimension along which a tensor is sparsified (e.g. the last dimension of an N-dimensional dense tensor) and all the sparse matrix operations/optimisations should work under the hood with no change to the pytorch interface (they should look like dense tensors in pytorch) - hardware permitting.

Perhaps there is a way the existing pytorch sparse tensor types can be manipulated to emulate this functionality?

richardbaxter · October 30, 2023, 1:52am

A. batched sparse tensors

The problem could be solved with a batched sparse tensor implementation (a hybrid tensor where the first dimension is dense and the other(s) are sparse);

To multiply a dense tensor (of N-1 dim) with a batched sparse representation (2 dim; 1 dense, 1 sparse) of the underlying hybrid tensor (N dim; N-1 dense, 1 sparse) it could be temporarily reshaped to the sparse batch size (1 dim).
The hybrid dense-sparse tensor could be reshaped back to its original state (N-1 dim) from its batched sparse representation (2 dim; 1 dense, 1 sparse) after the sparse dimension operations have been performed and the sparse dimension has been removed.

It appears that batched sparse tensors have never been implemented in pytorch;

github.com/pytorch/pytorch

Support batch indexing with sparse tensors with torch.sparse

opened 03:04AM - 13 Apr 22 UTC

BichenWuUCB

module: sparse triaged

### 🚀 The feature, motivation and pitch I am working on a problem that requires… looking up sparse tensor values based on a batch of indices. The problem can be abstracted by the following example: ``` import torch a = torch.sparse_coo_tensor(indices=[[0, 2, 3, 6]], values=[0, 1, 2, 3], size=(10000000,)) index = [1, 2, 3] print(a[index]) ``` If `a` is a dense tensor (vector), then given a list of indices, `a[index]` will return a tensor with the i-th element as `a[index[i]]`. However, on sparse tensors, this operator is not supported, and will give the following error: ``` NotImplementedError: Could not run 'aten::index.Tensor' with arguments from the 'SparseCPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::index.Tensor' is only available for these backends: [CPU, CUDA, QuantizedCPU, BackendSelect, Python, Named, Conjugate, Negative, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradLazy, AutogradXPU, AutogradMLC, AutogradHPU, AutogradNestedTensor, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, UNKNOWN_TENSOR_TYPE_ID, Autocast, Batched, VmapMode]. CPU: registered at aten/src/ATen/RegisterCPU.cpp:18433 [kernel] CUDA: registered at aten/src/ATen/RegisterCUDA.cpp:26496 [kernel] QuantizedCPU: registered at aten/src/ATen/RegisterQuantizedCPU.cpp:1068 [kernel] BackendSelect: fallthrough registered at ../aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback] Python: registered at ../aten/src/ATen/core/PythonFallbackKernel.cpp:47 [backend fallback] Named: registered at ../aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback] Conjugate: registered at ../aten/src/ATen/ConjugateFallback.cpp:18 [backend fallback] Negative: registered at ../aten/src/ATen/native/NegateFallback.cpp:18 [backend fallback] ADInplaceOrView: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:64 [backend fallback] AutogradOther: registered at ../torch/csrc/autograd/generated/VariableType_1.cpp:9548 [autograd kernel] AutogradCPU: registered at ../torch/csrc/autograd/generated/VariableType_1.cpp:9548 [autograd kernel] AutogradCUDA: registered at ../torch/csrc/autograd/generated/VariableType_1.cpp:9548 [autograd kernel] AutogradXLA: registered at ../torch/csrc/autograd/generated/VariableType_1.cpp:9548 [autograd kernel] AutogradLazy: registered at ../torch/csrc/autograd/generated/VariableType_1.cpp:9548 [autograd kernel] AutogradXPU: registered at ../torch/csrc/autograd/generated/VariableType_1.cpp:9548 [autograd kernel] AutogradMLC: registered at ../torch/csrc/autograd/generated/VariableType_1.cpp:9548 [autograd kernel] AutogradHPU: registered at ../torch/csrc/autograd/generated/VariableType_1.cpp:9548 [autograd kernel] AutogradNestedTensor: registered at ../torch/csrc/autograd/generated/VariableType_1.cpp:9548 [autograd kernel] AutogradPrivateUse1: registered at ../torch/csrc/autograd/generated/VariableType_1.cpp:9548 [autograd kernel] AutogradPrivateUse2: registered at ../torch/csrc/autograd/generated/VariableType_1.cpp:9548 [autograd kernel] AutogradPrivateUse3: registered at ../torch/csrc/autograd/generated/VariableType_1.cpp:9548 [autograd kernel] Tracer: registered at ../torch/csrc/autograd/generated/TraceType_1.cpp:10664 [kernel] UNKNOWN_TENSOR_TYPE_ID: fallthrough registered at ../aten/src/ATen/autocast_mode.cpp:466 [backend fallback] Autocast: fallthrough registered at ../aten/src/ATen/autocast_mode.cpp:305 [backend fallback] Batched: registered at ../aten/src/ATen/BatchingRegistrations.cpp:1016 [backend fallback] VmapMode: fallthrough registered at ../aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback] ``` To by-pass this, I will have to convert `a` to a dense tensor as ``` print(a.to_dense()[index]) ``` However, this is not memory-friendly, especially if `a` is very sparse and the the full size of `a` is huge. So I wonder there exists a way to batch-index sparse tensors -- the same way as dense tensors, or it is possible to support this in near future. Thanks! ### Alternatives _No response_ ### Additional context _No response_ cc @nikitaved @pearu @cpuhrsch

B. hybrid dense-sparse tensor implementations

A possible implementation of hybrid dense-sparse tensor would be a dense tensor that contains an integer pointer to a sparse tensor (each sparse tensor contains elements with value/index as in the current pytorch sparse tensor implementation).

When the user requests a particular subset of the tensor (i.e. any operation that requires indexing dense dimensions/batch samples of the tensor e.g. t[3] or torch.min(t, dim=1)), it performs an effective gather (single or multi-index lookup) operation based on the indices to gather the relevant sparse tensors (elements).

Are there specific developers working on (hybrid) sparse tensor implementations?