Adding a linear layer to an existing model

I’m trying to add a new layer to an existing network (as the first layer) and train it on the original input. When I add a convolutional layer everything works perfectly but when I change it to linear it doesn’t work. Any ideas why?
Here is the whole network:

class ActorCritic(torch.nn.Module): #original model
    def __init__(self, num_inputs, action_space):
        super(ActorCritic, self).__init__()
        self.conv1 = nn.Conv2d(num_inputs, 32, 3, stride=2, padding=1)
        self.conv2 = nn.Conv2d(32, 32, 3, stride=2, padding=1)
        self.conv3 = nn.Conv2d(32, 32, 3, stride=2, padding=1)
        self.conv4 = nn.Conv2d(32, 32, 3, stride=2, padding=1)

        self.lstm = nn.LSTMCell(32 * 3 * 3, 256)

        num_outputs = action_space.n
        self.critic_linear = nn.Linear(256, 1)
        self.actor_linear = nn.Linear(256, num_outputs)

    def forward(self, inputs):
        inputs, (hx, cx) = inputs
        x = F.elu(self.conv1(inputs))
        x = F.elu(self.conv2(x))
        x = F.elu(self.conv3(x))
        x = F.elu(self.conv4(x))
        x = x.view(-1, 32 * 3 * 3)
        hx, cx = self.lstm(x, (hx, cx))
        x = hx
       return self.critic_linear(x), self.actor_linear(x), (hx, cx)

class TLModel(torch.nn.Module): #new model
    def __init__(self, pretrained_model, num_inputs):
        super(TLModel, self).__init__()
        self.new_layer = nn.Linear(1*1*42*42, 1*1*42*42)
        self.pretrained_model = pretrained_model

    def forward(self, inputs):
        inputs, (hx, cx) = inputs
        x = F.elu(self.new_layer(inputs.view(-1, 1*1*42*42)))
        return self.pretrained_model.forward((x.view(1,1,42,42), (hx, cx)))

I tried different activation functions (not just elu). it works with conv:

 class TLModel(torch.nn.Module):
    def __init__(self, pretrained_model, num_inputs):
        super(TLModel, self).__init__()
        self.new_layer = nn.Conv2d(num_inputs, num_inputs, 1)
        self.pretrained_model = pretrained_model

    def forward(self, inputs):
        inputs, (hx, cx) = inputs
        x = F.elu(self.new_layer(inputs))
        return self.pretrained_model.forward((x, (hx, cx)))

The number of inputs is 1 and the size of an input is 1x1x42x42

What kind of error did you have ?

return self.pretrained_model.forward((x, (hx, cx)))

is equivalent to :

return self.pretrained_model((x, (hx, cx)))

It works either way. The problems is that the new layer doesn’t adjust itself to the rest of the network and it happens only with linear layer and not with conv.

@Shani_Gamrian Your approach looks fine. What kind of error message do you get?

I tried to re-implement a simplified version of your approach (I just dropped some layers):

class PretrainedNet(torch.nn.Module): #original model
    def __init__(self, num_inputs, num_outputs):
        super(PretrainedNet, self).__init__()
        self.conv1 = nn.Conv2d(num_inputs, 32, kernel_size=3, stride=1, padding=1)
        self.out = nn.Linear(10*10*32, num_outputs)

    def forward(self, x):
        x = F.elu(self.conv1(x))
        x = x.view(-1, 10*10*32)
        x = self.out(x)
        return x

class NewModel(torch.nn.Module): #new model
    def __init__(self, pretrained_model, num_inputs, shape_pretrain):
        super(NewModel, self).__init__()
        self.num_inputs = num_inputs
        self.shape_pretrain = shape_pretrain
        self.new_layer = nn.Linear(self.num_inputs,
        self.pretrained_model = pretrained_model

    def forward(self, x):
        x = F.elu(self.new_layer(x.view(-1, self.num_inputs)))
        return self.pretrained_model.forward(x.view(*self.shape_pretrain))
# Generate random input
batch_size = 1
W = 10
H = 10
channels = 3
data = Variable(torch.FloatTensor(batch_size, channels, H, W).normal_())

# Create pretrained net
pretrained_net = PretrainedNet(num_inputs=channels, num_outputs=1)

# Create new net
shape_pretrain = np.array([batch_size, channels, H, W])
new_net = NewModel(pretrained_model=pretrained_net,

# Test

This test case seems to work. I assume the error might be somewhere related to the LSTM etc.
I hope this short example is helpful for you.

I don’t get any errors, it just doesn’t seems to adjust itself to the rest of the network and doesn’t train.

Oh, sorry. I misread the question :confused:

Thanks for the question, I also have the same issues in one of my network.
Even though the network and subsequent training ‘seems’ working, the model is not learning.


Generator from a GAN Training, I am trying to assign labels to the generated images. Added an extra layer for the classification. Thus we add a Linear layer for classification at the end over and top of the GAN Generator. GAN training was completed with very low loss for both the Discriminator and the Generator.

Two observations,

  1. Over the iterations the loss is decreasing but never close to zero even after 100 iterations. However, if I train the same model without prior weights(random initialization) from the GAN model is learning pretty fast(almost in couple of iterations) - This is how I passively inferred about (NOT) learning.
  2. There are two labels in my class, one of them is classified correctly, however, even after repeated attempts to teach the other other label, learning is sparse. According to me the learning should be faster now.

Any insights to whats going is appreciated.