I am storing a dataset as a sparse CSR tensor and I want to write a data loader which selects certain row indices and outputs a dense tensor.
Something like this:
class Dataset:
def __init__(self, dataset: torch.sparse.FloatTensor, batch_size: int, shuffle: bool):
self.dataset = dataset
self.batch_size = batch_size
self.inds = torch.arange(dataset.shape[0]).long()
self.ptr = 0
self.shuffle = shuffle
def _reset(self):
if self.shuffle:
self.inds = self.inds[torch.randperm(len(self.inds))]
self.ptr = 0
def __iter__(self):
return self
def __next__(self):
if self.ptr == len(self.inds):
self._reset()
raise StopIteration()
next_ptr = min(len(self.inds), self.ptr + self.batch_size)
inds = self.inds[self.ptr:next_ptr]
dense_tensor = self.dataset.index_select(0, inds).to_dense()
self.ptr = next_ptr
return (dense_tensor, inds)
if dataset
is a COO sparse tensor, everything works fine, but it it is a CSR sparse tensor, then I get the following error:
NotImplementedError: Could not run 'aten::index_select' with arguments from the 'SparseCsrCPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::index_select' is only available for these backends: [CPU, SparseCPU, BackendSelect, Named, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, UNKNOWN_TENSOR_TYPE_ID, AutogradMLC, AutogradHPU, AutogradNestedTensor, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].
Is index_select
really not implemented for sparse CSR tensors? Isn’t that what CSR format is fast at? What’s the most efficient way for me to slice a sparse torch tensor… something that will work on CPU and GPU?
Related to this (Sparse tensor support for slice, reduce sum and element wise comparison?) but it’s been a while.
Thanks!