Hi!
I am trying to learn a matrix E that maps a vector W to a vector C, and a matrix D that maps the vector C to to a vector W2. However, I only want to update parameters of E and D for certain combinations w, c and c, w respectively.
In this way, I would like the backward() function on my loss to not compute any gradient for the parameters that are outside of the scope of the certain combinations (which I have defined in an adjacency matrix).
It seems like some type of sparse Variable would have this behavior, but I do not think that is implemented in Pytorch. Instead, does anyone know a trick for ‘masking’ these parameter matrices? I can simply 0 out the gradients for the parameter values that I know I want to ignore, or even more simply create a mask for each parameter matrix and compute, e.g. E * E_mask.
But this does not ensure that my gradients that persist are correct, because backprop will compute each gradient w.r.t. every parameter, originally, before I am able to 0 anything out.