CNN on 2d image with one channel

Similar questions have been asked but I have not been able to solve the problem. I am trying to create a CNN over an image with one channel. I keep getting variations of a dimension error. Input Image is of size (99 * 99). Batch size 4.

Shape of input is (4 *99 * 99). When I tried to pass this into a CNN I obviously got an error because I was telling it there was only one channel but instead there are 99 l. So I did unsqueeze_(1) to get the shape of (4 * 1 * 99 * 99) which seemed correct. However I get this error
RuntimeError: Expected 3-dimensional tensor, but got 4-dimensional tensor for argument #1 ‘self’ (while checking arguments for max_pool1d)

Which I am not sure how to solve. I am using a nn.Conv2d layer as my first layer.

The unsqueeze was necessary, as your channel dimension was missing.
Since you are using an image, you should also use nn.MaxPool2d instead of nn.MaxPool1d.
Here is a small example:

model = nn.Sequential(
    nn.Conv2d(in_channels=1, out_channels=3, kernel_size=3, stride=1, padding=1),
    nn.MaxPool2d(2)
)
x = torch.randn(4, 1, 99, 99)
output = model(x)
print(output.shape)
> torch.Size([4, 3, 49, 49])
1 Like

Thank you, works perfectly! A side question, by any chance do you know any resources that would help me understand what cnn architecture to use and kernel size? I am new to the space and a lot of blogs and textbooks I have read recommend searching for a similar model and copying that architecture. I am using a cnn in non exactly conventional way so I can’t find anything similar.

It depends a bit on your image stats etc.
Usually a kernel size of 3 works quite good, as a lot of models use it (see vgg etc.), for a “natural” image of approx. 224x224.
If you have a medical image (e.g. MRI) in a high resolution, the kernel size might not be the best.

A good resource is Stanford’s CS231n to get a feeling on convolutions.

1 Like

hi,
im having a similar problam with avgpool1d:

this is code:
self.bn1 = nn.BatchNorm2d(64)
self.layer1 = self._make_layer(block, 64, num_blocks[0], stride=1)
self.layer2 = self._make_layer(block, 128, num_blocks[1], stride=2)
self.layer3 = self._make_layer(block, 256, num_blocks[2], stride=2)
self.layer4 = self._make_layer(block, 512, num_blocks[3], stride=2)
if (self.extraClasses == 1 or extraLayer == 1) and extraFeat == False:
self.linear = nn.Linear(512 * block.expansion, num_classes)
if self.pool:
self.scores = nn.AvgPool1d(2, stride=2)

def forward(self, x):
out = F.relu(self.bn1(self.conv1(x)))
cnnOut = out
layer1 = self.layer1(out)
layer2 = self.layer2(layer1)
layer3 = self.layer3(layer2)
layer4 = self.layer4(layer3)

    out = F.avg_pool2d(layer4, 4)
    out = out.view(out.size(0), -1)
    featureLayer = out
    pool = 0
    if self.extraClasses == 1:
        out = self.linear(out)
        if self.pool:
            pool = self.scores(out)

in the attempt of calculating the pooling, i get the following:

RuntimeError: Expected 3-dimensional tensor, but got 2-dimensional tensor for argument #1 ‘self’ (while checking arguments for avg_pool1d)

thanks!

nn.AvgPool1d expects a 3D input tensor in the shape [batch_size, seq_len, features] while you are trying to use a 2D tensor. Could you explain your use case a bit more and what result would be expected?

i want to take each 2 neurons at the end, and sum it up/ averge between them in the last layer.
meaning that if i had 20 neurons in almost final layer, ill have 10 neurons in the last layer

Assuming “between them” means neighboring features, a view and mean operation should work:

batch_size = 10
features = 20
x = torch.arange(batch_size*features).view(batch_size, features).float()

x = x.view(batch_size, -1, 2)
y = x.mean(dim=2)
print(y)