Add a layer to the beginning of a trained model

Shani_Gamrian · September 9, 2017, 6:45pm

Hi everyone,
I want to do something similar to fine tuning - add a layer to a trained network, but I want to add the layer at the beginning and not in the end so the output types are the same as before, How should I do that?
Here is how the original model looks like:

class ActorCritic(torch.nn.Module):
    def __init__(self, num_inputs, action_space):
        super(ActorCritic, self).__init__()
        self.conv1 = nn.Conv2d(num_inputs, 32, 3, stride=2, padding=1)
        self.conv2 = nn.Conv2d(32, 32, 3, stride=2, padding=1)
        self.conv3 = nn.Conv2d(32, 32, 3, stride=2, padding=1)
        self.conv4 = nn.Conv2d(32, 32, 3, stride=2, padding=1)

        self.lstm = nn.LSTMCell(32 * 3 * 3, 256)

        num_outputs = action_space.n
        self.critic_linear = nn.Linear(256, 1)
        self.actor_linear = nn.Linear(256, num_outputs)

        self.apply(weights_init)
        self.actor_linear.weight.data = normalized_columns_initializer(
            self.actor_linear.weight.data, 0.01)
        self.actor_linear.bias.data.fill_(0)
        self.critic_linear.weight.data = normalized_columns_initializer(
            self.critic_linear.weight.data, 1.0)
        self.critic_linear.bias.data.fill_(0)

        self.lstm.bias_ih.data.fill_(0)
        self.lstm.bias_hh.data.fill_(0)
        self.train()

    def forward(self, inputs):
        inputs, (hx, cx) = inputs
        x = F.elu(self.conv1(inputs))
        x = F.elu(self.conv2(x))
        x = F.elu(self.conv3(x))
        x = F.elu(self.conv4(x))
        x = x.view(-1, 32 * 3 * 3)
        hx, cx = self.lstm(x, (hx, cx))
        x = hx
       return self.critic_linear(x), self.actor_linear(x), (hx, cx)

I want to add a layer (convolutional if possible) before conv1 that will receive the same input as conv1 and pass the output to it.

fmassa · September 10, 2017, 1:28pm

If you want to modify your model definition, you can just add another layer before self.conv1(inputs). If you don’t want to modify it, you can do something like

class MyModule(ActorCritic):
    def __init__(self, **kwargs):
        super(MyModule, self).__init__(**kwargs)
        self.new_conv = nn.Conv2d(3, 3, 1)

    def forward(self, inputs):
        inputs, (hx, cx) = inputs
        x = F.elu(self.new_conv(inputs))
        return super(MyModule, self).forward((x, (hx, cx)))

Or variations of that

Shani_Gamrian · September 11, 2017, 4:24pm

Thank you!
It works but for some reason the network doesn’t learn.
My new model looks like that:

class TLModel(torch.nn.Module):
    def __init__(self, pretrained_model, num_inputs):
        super(TLModel, self).__init__()
        self.new_layer = nn.Conv2d(num_inputs, num_inputs, 1)
        self.pretrained_model = pretrained_model

    def forward(self, inputs):
        inputs, (hx, cx) = inputs
        x = F.elu(self.new_layer(inputs))
        return self.pretrained_model.forward((x, (hx, cx)))

Here is how I initialized the network:

ac_model = ActorCritic(env.observation_space.shape[0], env.action_space)
checkpoint = torch.load(fname)
ac_model.load_state_dict(checkpoint['state_dict'])
for param in ac_model.parameters():
    param.requires_grad = False
tlmodel = TLModel(shared_model, env.observation_space.shape[0], env.action_space)

and the optimizer is:
optimizer = my_optim.SharedAdam(tlmodel.new_layer.parameters(), lr=args.lr)

fmassa · September 12, 2017, 12:09pm

I’d say it’s somewhat expected that it doesn’t learn, because the pre-trained model expects some input and the input it gets is completely different.
I’d maybe initialize the weights of the new_layer such that they start as the identity (with zero bias), so that your network has at least some chance to optimize.

Also, I’d advise using self.pretrained_model((x, (hx, cx))) instead of calling forward, but that’s a detail.

Akis_Linardos · January 18, 2018, 1:15pm

I had a similar problem trying to add an extra layer on top of a pretrained model and I tried this solution. However this new class doesn’t seem to be able to use the 2 GPUs even tho I load it on DataParallel. The previous model loads just fine when I train it even if I load it with
This makes no sense to me but I can’t seem to fix it after numerous tries, has it occured to you too?

Is there something else I have to do to make the new model run on DataParallel?

model = Model.One_More_Module(pretrained_model) #One_More_Module is a class that takes another network and creates one more layer

model = torch.nn.DataParallel(model).cuda()