What is the gradient like if manipulate parameters of nn.Module as Variable?

given m1=nn.Linear(100,50), and m1 converts Variable A(4 * 100) to VariableB(4 * 50), and suppose the parameters
of m1 is W1(100 * 50 tensor) and b1(50 * 1 tensor).
So if I take W1 as a Variable, and given C(100*50 tensor) and do something like:

B = m1(A)
D = W1+C
loss1 = loss_func1(D,target1)
loss2 = loss_func2(B,target2)

what is the gradient like for W1, given the fact it is the parameters of m1, not purely Variable? Anything special?


if you check your W1 = m1.weight, it a an nn.Parameter, which is a subclass of autograd.Variable. So the parameters can actually be used as any other Variable :slight_smile: