How does one collect biases and weights in a single layer such that backpropagation works correctly?

I wanted to change the loss function but the new loss function that I was going to use depends on the weights and biases being organized in one single matrix.

For the sake of an example imagine I have only two set of weight V (output layer) and W (input layer) and biases b for the input layer only. These weights are inside a linear layer in a sequential model. I wanted to collect the weight in input layer and form the following matrix (in pseudocode):

W = [mdl.linear[0].weight, mdl.linear[0].bias] #first set of params
V = mdl.linear[0].weight #second set of params

and I wanted the loss function be something like this:

loss = train_error(mdl) + reg_lambda*||VW||^2 #ERM

how does one actually implement such a thing in pytorch, its not clear to me how it should work and how re-assigning vectors works without screwing up back propagation because V and W are present in both the regularizer and the train error. i.e. the loss is a function as follow:

loss(W,V) = train_error(W,V) + reg_lambda*R(W,V)

I’m not satisfied with my following solution because I wanted to explicitly construct W and V but I don’t know how to do it without potentially screwing up backprop (due to in-place operations and re-assigments and not knowing how to use .clone() properly). So this is the solution I have right now:

l=2
b_w = mdl[0].bias
W_p = mdl[0].weight
V = mdl[1].weight
regularization = (torch.matmul(V,W_p) + torch.matmul(V,b_w)).norm(l)

I think it should work but it just feels I’m avoiding the main issue which is that I don’t know how to do it by creating a new matrix:

W = [W_p, b_w]
V = mdl[1].weight

and have backprop work properly at the same time.

you could do:

W = torch.cat([mdl.linear[0].weight.view(-1), mdl.linear[0].bias.view(-1)], dim=0)