Sparse Gradients and Computational Efficiency

I have a use case which involves updating specific parts of a tensor while keeping others fixed. I have successfully achieved this using the register_hooks method, which allows me to manipulate the gradients of the tensor (by zeroing the gradients of the elements that I don’t want to update). However, this approach does not fully exploit the sparsity of the gradients, resulting in high computational costs.

I am seeking advice on how to improve the computational efficiency of my approach, considering the sparse nature of the gradients. I would like to reduce the computational cost by performing operations and backpropagation only on the non-zero gradient elements of the tensor.

I would appreciate any guidance or recommendations on how to efficiently update only the desired elements of the tensor.