I wanted to change the loss function but the new loss function that I was going to use depends on the weights and biases being organized in one single matrix.
For the sake of an example imagine I have only two set of weight V
(output layer) and W
(input layer) and biases b
for the input layer only. These weights are inside a linear layer in a sequential model. I wanted to collect the weight in input layer and form the following matrix (in pseudocode):
W = [mdl.linear[0].weight, mdl.linear[0].bias] #first set of params
V = mdl.linear[0].weight #second set of params
and I wanted the loss function be something like this:
loss = train_error(mdl) + reg_lambda*||VW||^2 #ERM
how does one actually implement such a thing in pytorch, its not clear to me how it should work and how re-assigning vectors works without screwing up back propagation because V and W are present in both the regularizer and the train error. i.e. the loss is a function as follow:
loss(W,V) = train_error(W,V) + reg_lambda*R(W,V)