Processing Sequence of images with conv2D

Hello,
I have image sequences for 300 persons, each person has 8 images and each image has 6 channels, i pass it to my network as [8,6,128,64]. My network is given below:

class MyModel(nn.Module):
    
    def __init__(self):
        super(MyModel, self).__init__()
        self.CNNLayer1 = nn.Sequential(
                nn.Conv2d(in_channels=6, out_channels=16, kernel_size=5, stride=1, padding=4),
                nn.Tanh(),
                nn.MaxPool2d(kernel_size=2, stride=2)
                )      
        self.CNNLayer2 = nn.Sequential(
                nn.Conv2d(in_channels=16, out_channels=32, kernel_size=5, stride=1, padding=4),
                nn.Tanh(),
                nn.MaxPool2d(kernel_size=2, stride=2)
                )
        self.CNNLayer3 = nn.Sequential(
                nn.Conv2d(in_channels=32, out_channels=32, kernel_size=5, stride=1, padding=4),
                nn.Tanh(),
                nn.MaxPool2d(kernel_size=2, stride=2)
                )
        self.DropoutLayer = nn.Dropout(0.6)
        self.FullyConnected = nn.Linear(1, 8, 32*19*11)
        
        
        #here i have to put RNN and then convert the sequence to a [1,8, channels*width*height] matrix
        #after RNN here i want to get a [1,128] feature vector for the whole sequence, i.e. the 8 images
        #should have one vector
    def forward(self, inp):
        #print(inp.shape) #prints torch.Size([1, 8, 6, 128, 64])[batchsize, sequenceLength, channels, H, W]
        out = self.CNNLayer1(inp[0])
        out = self.CNNLayer2(out)
        out = self.CNNLayer3(out)
        out = self.DropoutLayer(out)
        print(out.shape)#prints torch.Size([ 8, 32, 19, 11]) [batchsize, sequenceLength, channels, H, W]
        out = self.FullyConnected(out)
        return out

this error arises:

RuntimeError: size mismatch, m1: [4864 x 11], m2: [1 x 8] at /pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:266

@ptrblck sir, help please?

nn.Linear takes only in_features and out_features as input along with bias which is bool. You are giving three arguments docs.

Okay i will try this

so i changed my code and i did something like this:

....
self.DropoutLayer = nn.Dropout(0.6)
self.FullyConnected = nn.Linear(8, 32*19*11)

the forward is:

def forward(self, inp):
        #print(inp.shape) #prints torch.Size([1, 8, 6, 128, 64])[batchsize, sequenceLength, channels, H, W]
        out = self.CNNLayer1(inp[0])
        out = self.CNNLayer2(out)
        out = self.CNNLayer3(out)
        out = self.DropoutLayer(out)
        print(out.shape)#torch.Size([8, 32, 19, 11])
        out = self.FullyConnected(out)
        return out

this error arises at the out = self.FullyConnected(out)

RuntimeError: size mismatch, m1: [4864 x 11], m2: [8 x 6688] at /pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:266

Sorry for late response, had gone out for a while.

You are again confusing the nn.Linear layer. Suppose the following:

out = self.DropoutLayer(out)
# Out here gives you the shape [8, 32, 19, 11]

The significance of the numbers :- 8 is the batch size, 32 is the number of channels and 19, 11 are the height and width of the image.
Now you want to add a Linear layer which will simply take the flattened version of the input i.e. (8, 32*19*11) as input.

Now the point you are missing. When you use a dense layer you want to give the output size as well. Like if you have 10 classes as output, you want to add 10 neurons in the layer.

in_features -> input size (in your case it is 32*19*11)
out_features -> output size (which you have not defined, I think it is 128)

# In the __init__
self.FullyConnected = nn.Linear(32*19*11, 128)

# In the forward
out = out.view(-1, 32*19*11)
# As linear layer takes a 2D input where the first dim is batch size
out = self.FullyConnected(out)
1 Like