Concatenating more input features onto convolutional layer output

Hello Everyone,

I am currently implementing a regression model in pytorch utilizing a CNN. The model takes in an image as input, but also has to take in some vectorized information, i.e. 1-D input vector.

My images are fairly large (256x512), but my dataset is also very large (5 million). Long story short, I only have to store 500 images since every chunk of 10k vectorized inputs uses the same image.

I want to utilize this redundancy in my training as well, so instead of running the same image through my CNN 1000 times (my batch size is 1000) I can run it through the convolutional layers 1 time and then only change the vectorized input.

The model I’m using is a VGG with the following code:

cfg = {
    'VGG11': [64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
    'VGG13': [64, 64, 'M', 128, 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
    'VGG16': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M'],
    'VGG19': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 256, 'M', 512, 512, 512, 512, 'M', 512, 512, 512, 512, 'M'],
class VGG(nn.Module):
    def __init__(self, vgg_name, imgH, imgW, numClasses):
        super(VGG, self).__init__()
        self.features = self._make_layers(cfg[vgg_name])
        #Figure out the shape for flatten
        x = torch.ones(1,1,imgH, imgW)
        outputShape = self.FigureOutFlattenShape(x)
        self.classifier = nn.Linear(outputShape[1] + 208, numClasses)
        #self.classifier = nn.Linear(512, 10)

    #Added by K to figure out the flattening layer 
    def FigureOutFlattenShape(self, x):
        out = self.features(x)
        out = out.view(out.size(0), -1)
        return out.shape

    #x is a tuple, x[0] contains a single image, x[1] contains vectorized input vector
    def forward(self, x):
        out = self.features(x[0])
        out = out.view(out.size(0), -1)
        out = out.repeat(len(x[1]), 1)
        out =, x[1]), 1)
        out = self.classifier(out)
        return out

    def _make_layers(self, cfg):
        layers = []
        in_channels = 1
        for x in cfg:
            if x == 'M':
                layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
                layers += [nn.Conv2d(in_channels, x, kernel_size=3, padding=1),
                in_channels = x
        layers += [nn.AvgPool2d(kernel_size=1, stride=1)]
        return nn.Sequential(*layers)

The key part of this post is in forward() where I run “out = out.repeat(len(x[1]), 1)”. My concern is that, for training, this will somehow mess things up since the actual training batch size is 1000 but the convolutional layers only got 1 input. My code, as it is set up now, is running and the model is improving, but I’m worried that performance will either be really slow or plateau very soon.

One symptom I’m already seeing is that my learning rate had to be very small (1e-5) otherwise the training loss would grow exponentially.

I feel like this is somewhat of a niche concern, but if anyone has any experience with similar configurations and can give some advice or pointers that would help greatly.