If your sizes are relatively small, then you can use something like this to create a sparse matrix using the default strided tensor layout.

```
value=torch.rand(64, 10) # shape=[64,10]
ids=torch.randint(0, 99, (10,)) # shape=[10], eg [94,13,20,6,27,45,15,7,53,2]
sparse_tensor = torch.zeros(64, 100)
sparse_tensor[:, ids] = value
```

It will not make that much of a difference in memory if you define it like this or as a `sparse_tensor`

. For matrix multiplication you can then use `@`

or `torch.matmul`

or `torch.mm`

.

However, if you do have very large sparse matrices, then you can either create a `torch.sparse_coo_tensor`

or a `torch.sparse_csr_tensor`

.

According to the documentation, `torch.sparse_csr_tensor`

does not support CUDA, so I will show you how to do a `torch.sparse_coo_tensor`

for your case.

```
value=torch.rand(64, 10) # shape=[64,10]
ids=torch.randint(0, 99, (10,)) # shape=[10], eg [94,13,20,6,27,45,15,7,53,2]
# First you need to redefine your indices to be coordinates.
# There are many ways to do it as shown on the documentation
# For this example I did it like this
# [[0, 94], [0, 13], [0, 20], ..., [63, 7], [63, 53], [63, 2]]
idx = [[i, int(j)] for i in range(64) for j in ids]
sparse_tensor = torch.sparse_coo_tensor(list(zip(*idx)), value.view(-1), (64, 100))
```

also, if you want to see which operations support gradient, you can look here