I have an array that has a few ones and a lot of zeros :
To be complete, it is the adjacency matrix of the pixels of a n*n image (here n=4), so it is n^2*n^2.
So there are total 4*n*(n-1) non-zero values in it, but only 2*n*(n-1) independent values because the matrix is symmetrical, and each diagonal has n*(n-1) values which correspond to the vertical or horizontal adjacencies.
0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0
1 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0
0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0
1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0
0 1 0 0 1 0 1 0 0 1 0 0 0 0 0 0
0 0 1 0 0 1 0 1 0 0 1 0 0 0 0 0
0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0
0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0
0 0 0 0 0 1 0 0 1 0 1 0 0 1 0 0
0 0 0 0 0 0 1 0 0 1 0 1 0 0 1 0
0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1
0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0
0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0
0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1
0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0
I use it to compute things, and I have a final scalar loss.
I want to compute the gradient of the loss w.r.t each element that has a 1 in the matrix, to optimize them.
I thought about two ways to do that, but one of them is not doable (afaik), and I’m afraid the second one is inefficient.
First method :
- Declare a tensor of size 2n(n-1) as a Variable with requires_grad=True
- Insert those values in a n^2*n^2 (this is the part I don’t know how to do)
- Do the computation and get the loss
In that method, I will only get the gradients of the values I want, and no others, so no useless computation.
The problem is, I don’t know if it’s possible to insert the values in a new tensor, and keep the gradient connection. Is it possible ?
The closest I have seen is torch.index_select(), but I can only select on 1 dimension, which would make the process rather uneasy, seeing that I want to select the diagonals.
Second method :
- Declare a tensor of size n^2*n^2 as a Variable with requires_grad=True
- Do the computation and get the loss
In this method, it would be rather inefficient to compute the gradient w.r.t all the zero elements.
So is there a way, to do a “selective requires_grad” with for example a mask, that would prevent computing all gradients (especially useful if n increases)?
I saw that I can pass the mask (which is actually also the matrix above) directly to backward (loss.backward(mask)
), which will result in a gradient with positive values everywhere I want and zeros everywhere else, but did it do it efficiently ? So my question is, when one of the elements in the tensor passed to backward() is zero, does PyTorch bother to calculate it for that element since it’s going to end up being zero ?
Finally, I have seen the torch.sparse API in the doc, and I think I could use it, but I am unsure whether I can define a sparse tensor as a leaf variable, and if the gradient propagates well through to_dense().
But mostly, the API says “This API is currently experimental and may change in the near future.”, so I don’t really want to use it if it’s going to change…
I’d appreciate some help on these matters