Building complex CNN

Is it possible to build a CNN like the following easily in Pytorch?

X_train[0:99] -> Con1 -> Conv2 - MaxPool / + X[99:105] -> linear1 -> linear2 ->Ouput

As in we are adding more information to the fully connected layers than simply what the conv layers tell us. For example imagine doing NLP on movie reviews but you know the type of movie and you know which actors were in it etc would you be able to add that information to the fully connected layers while having the conv layers analyze the actual sentences of the review.
Is this possible? Any examples I could look at ? Is it worth trying out this technique?

How would you like to split your input data?
Since it seems you would like to use a one-dimensional conv layer, your input should be of shape [batch_size, channels, length].
Are you splitting X_train based on the length? Also, do you want to concatenate the split input?
If so, this could be a starter code:

class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.conv1 = nn.Conv1d(5, 10, 3, 1, 1)
        self.conv2 = nn.Conv1d(10, 10, 3 ,1, 1)
        self.pool2 = nn.MaxPool1d(2)
        self.fc1 = nn.Linear(10 * 50 + 5 * 5, 100)
        self.fc2 = nn.Linear(100, 2)
    def forward(self, x):
        x1 = x[:, :, :100]
        x2 = x[:, :, 100:].contiguous().view(x.size(0), -1)
        x1 = F.relu(self.conv1(x1))
        x1 = F.relu(self.conv2(x1))
        x1 = self.pool2(x1)
        x1 = x1.view(x1.size(0), -1)
        x =, x2), 1)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

batch_size = 1
channels = 5
length = 105
x = torch.randn(batch_size, channels, length)
model = MyModel()

output = model(x)

Note that you might want to check the ranges before the concatenation, as they might be quite different, which might lead to training difficulties.

1 Like

This is incredible! thank you so much.

@ptrblck works perfectly! By any chance do you anyone who does similar things so that I can look at what hyper parameters and architecture they are using? Read all of cs231n was super interesting thank you for that advice last week.

We had quite a long discussion about a similar topic in this thread.
Maybe you can get some ideas for your approach. :wink:

sorry to bother you. It looks like I have the opposite problem of the thread. My Network won’t overfit. Below is the loss curve. When it jumps downwards it is becuase I have decreased the learning rate (learning rate annealing). Using Adam as my optimizer. y axis is loss and x axis is number of epochs.

last 100 epochs

any idea on how i can force it to overfit?

I would scale down the problem to just a single input and try to overfit your model.
If that’s not possible, your architecture, the hyperparameters or the training routine might have a bug or are not suitable for the problem.

1 Like