Calculating the gradient through a sparse matrix into a tensor of its values

Hello everybody!
I want to write a custom nn.Linear layer, but with weights in the form of a sparse matrix.

Please tell me how expensive it is to create a sparse matrix “on the fly” in order to calculate the gradient to the tensor of its values?
And how can this code below be optimized?

If I don’t create a new sparse matrix at each iteration of the loop, then an exception is thrown after the first iteration: “RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.”

And I can’t beat this error in any way

Code example:

import torch
import torch.optim as optim


if __name__ == '__main__':
    indices = [[1], [0]]
    values = torch.FloatTensor([10.0]).requires_grad_()
    target = torch.FloatTensor(
        [
            [1],
            [2]
        ]
    )

    inputs = torch.FloatTensor(
        [
            [1],
            [2],
            [3]
        ]
    )

    optimizer = optim.Adam([values], lr=0.1)
    for _ in range(200):
        tensor = torch.sparse_coo_tensor(indices, values, (2, 3))

        y = torch.sparse.mm(tensor, inputs)

        loss = ((target - y) ** 2).sum()
        print(y.tolist(), values)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()