I like to use a tensor with only a few variable elements which are considered during the backpropagation step. Consider for example:
a = torch.ones((2, 2), requires_grad=True)
b = torch.ones((2, 2), requires_grad=True)
c = torch.ones((1, 2), requires_grad=True)
x = torch.ones((2, 1))
y = c @ (b @ (a @ x))
Now for the c
tensor I want the c[0, 1]
element to be constant, i.e. not being varied during the optimization procedure. One idea is that, after doing e.g. y.sum().backward()
, I could zero the corresponding gradient elements:
c.grad[0, 1] = 0
Is this a correct way of dealing with the problem? Or are the other gradients (of a, b
) being influenced by the value of c.grad[0, 1]
during the backpropagation step? Considering the gradient calculation it seems they shouldnāt:
where L = y.sum()
, a^i_j
is the value of the j-th element in the i-th
layer of the computational graph (e.g. a^1_0 == (a @ x)[0]
) and w^i_{jk}
is the jk
element of the i-th tensor (e.g. w^2_{01} == b[0, 1]
. The first term of the above product is passed on during backpropagation and since it doesnāt contain derivatives with respect to any weights it seems that modifying parts of the gradient of a specific tensor wouldnāt affect previous gradients in the graph.
I just want to double check on this and ask in general if this is the preferred way of dealing with such situations? Maybe there is a more elegant (more appropriate) way of dealing with constant tensor elements?
Any help is appreciated. Thanks.