given m1=nn.Linear(100,50), and m1 converts Variable A(4 * 100) to VariableB(4 * 50), and suppose the parameters
of m1 is W1(100 * 50 tensor) and b1(50 * 1 tensor).
So if I take W1 as a Variable, and given C(100*50 tensor) and do something like:
m1=nn.Linear(100,50)
B = m1(A)
D = W1+C
loss1 = loss_func1(D,target1)
loss2 = loss_func2(B,target2)
loss=loss1+loss2
loss.backward()
what is the gradient like for W1, given the fact it is the parameters of m1, not purely Variable? Anything special?