Forward Pass with different weights

Hello pytorchers,

I would like to specify the weights used by my model during its forward pass. I basically have 2 sets of weights.

2  >    class Policy(nn.Module):
>  23         
>  24         def __init__(self,hidden_size,num_inputs,action_space):
>  25                 super(Policy,self).__init__()
>  26                 self.action_space = action_space
>  27                 self.num_outputs = action_space.shape[0]
>  28                 self.num_inputs = num_inputs
>  29                 self.hidden_size = hidden_size
>  30 
>  31                 self.fc1 = nn.Linear(self.num_inputs,self.hidden_size)
>  32                 self.mean = nn.Linear(self.hidden_size,self.num_outputs)
>  33                 self.std = nn.Linear(self.hidden_size,self.num_outputs)
>  34 
>  35         def forward(self,inputs,weights=None):
>  36                 x = F.relu(F.linear(inputs,weights['fc1.weight'],weights['fc1.bias']))
>  37                 mu = F.linear(x,weights['mean.weight'],weights['mean.bias'])
>  38                 sigma_sq = F.linear(x,weights['std.weight'],weights['std.bias'])
>  39                 sigma_sq = torch.exp(sigma_sq)
>  40                 return mu,sigma_sq

Q. Is this the right way to do it?
My forward function has a weights argument and I use the functional form of the Linear layer so that I can explicitly specify the weights.

I am not sure about my init() function where I declare self.fc1,self.mean and self.std. The main reason for doing this is so that I can create my two sets of weights with the same named parameters like so :

model = Policy()      
base_weights = OrderedDict((name, param) for (name, param) in model.named_parameters())`

Thanks in advance,

You are correct that functional form allows specifying weights. However, the two sets of weights would need to be registered as module parameters to be trained. Currently your self.fc1, self.mean and self.std have nothing to do with the forward function. So even that they are included in named_parameters, they are not really useful. Also I’m not sure why mean and std are linear layers. Their names seem to imply that they should be computed statistics.

To achieve two sets of weights, I think it would be simpler to register two layers and use a flag to control which set to use, e.g. something like

class DoubleWeightLinear(nn.Module):
  def __init__(self, n_in, n_out):
    self.l1 = nn.Linear(n_in, n_out)
    self.l2 = nn.Linear(n_in, n_out)

  def forward(self, x, use_first):
    if use_first:
      return self.l1(x)
      return self.l2(x)

Hi @SimonW,

Thanks! for the reply.
The mean and std are outputs of a policy network (continuous actions), but that is not really important as it could also be a classification problem.
I see what you are saying about being trained. However, I am trying to manually train the weights, i.e. I am computing gradients w.r.t a specific set of weights as opposed loss.backwards() (which would only train layers registered as module parameters, if I understand correctly)

So as long as I am updating the weights, it should be ok?


I see! You are extracting the weights after building the module, and then feeding into it. Now that makes sense! Ignore my previous comments about them not being trained. They are indeed being trained.

However, I don’t see how you get two sets of weights in original code. base_weights = OrderedDict((name, param) for (name, param) in model.named_parameters()) would just give you the same set of weights :slight_smile:.

What I would do is still the snippet I wrote above. Just that when you manually train, the set of parameters should include both set l1 and l2's parameters (or other names/types/networks whatsoever).

Hi @SimonW
I am trying to implement model-agnostic meta learning
There is already a really nice reference implementation (for the supervised learning case) in pytorch, but I am trying to see if I can do it another way.
The basic idea is that there is a set of base weights, and one set that are adapted to a specific task (by means of a gradient step). Gradients are needed either w.r.t the base parameters or the adapted parameters.

Thanks for the replies, atleast I think I am on the right track.

I remember that one! It is a cool paper :slight_smile:

Then I think what you are doing is completely reasonable. Sorry for the confusion earlier.

@SimonW No worries,

Thanks again!