CNN on 2d image with one channel

Tank · July 6, 2018, 9:01am

Similar questions have been asked but I have not been able to solve the problem. I am trying to create a CNN over an image with one channel. I keep getting variations of a dimension error. Input Image is of size (99 * 99). Batch size 4.

Shape of input is (4 *99 * 99). When I tried to pass this into a CNN I obviously got an error because I was telling it there was only one channel but instead there are 99 l. So I did unsqueeze_(1) to get the shape of (4 * 1 * 99 * 99) which seemed correct. However I get this error
RuntimeError: Expected 3-dimensional tensor, but got 4-dimensional tensor for argument #1 ‘self’ (while checking arguments for max_pool1d)

Which I am not sure how to solve. I am using a nn.Conv2d layer as my first layer.

ptrblck · July 6, 2018, 9:33am

The unsqueeze was necessary, as your channel dimension was missing.
Since you are using an image, you should also use nn.MaxPool2d instead of nn.MaxPool1d.
Here is a small example:

model = nn.Sequential(
    nn.Conv2d(in_channels=1, out_channels=3, kernel_size=3, stride=1, padding=1),
    nn.MaxPool2d(2)
)
x = torch.randn(4, 1, 99, 99)
output = model(x)
print(output.shape)
> torch.Size([4, 3, 49, 49])

Tank · July 6, 2018, 11:51am

Thank you, works perfectly! A side question, by any chance do you know any resources that would help me understand what cnn architecture to use and kernel size? I am new to the space and a lot of blogs and textbooks I have read recommend searching for a similar model and copying that architecture. I am using a cnn in non exactly conventional way so I can’t find anything similar.

ptrblck · July 6, 2018, 11:59am

It depends a bit on your image stats etc.
Usually a kernel size of 3 works quite good, as a lot of models use it (see vgg etc.), for a “natural” image of approx. 224x224.
If you have a medical image (e.g. MRI) in a high resolution, the kernel size might not be the best.

A good resource is Stanford’s CS231n to get a feeling on convolutions.

user3 · October 11, 2021, 2:29pm

hi,
im having a similar problam with avgpool1d:

this is code:
self.bn1 = nn.BatchNorm2d(64)
self.layer1 = self._make_layer(block, 64, num_blocks[0], stride=1)
self.layer2 = self._make_layer(block, 128, num_blocks[1], stride=2)
self.layer3 = self._make_layer(block, 256, num_blocks[2], stride=2)
self.layer4 = self._make_layer(block, 512, num_blocks[3], stride=2)
if (self.extraClasses == 1 or extraLayer == 1) and extraFeat == False:
self.linear = nn.Linear(512 * block.expansion, num_classes)
if self.pool:
self.scores = nn.AvgPool1d(2, stride=2)

def forward(self, x):
out = F.relu(self.bn1(self.conv1(x)))
cnnOut = out
layer1 = self.layer1(out)
layer2 = self.layer2(layer1)
layer3 = self.layer3(layer2)
layer4 = self.layer4(layer3)

    out = F.avg_pool2d(layer4, 4)
    out = out.view(out.size(0), -1)
    featureLayer = out
    pool = 0
    if self.extraClasses == 1:
        out = self.linear(out)
        if self.pool:
            pool = self.scores(out)

in the attempt of calculating the pooling, i get the following:

RuntimeError: Expected 3-dimensional tensor, but got 2-dimensional tensor for argument #1 ‘self’ (while checking arguments for avg_pool1d)

thanks!

ptrblck · October 12, 2021, 6:34am

nn.AvgPool1d expects a 3D input tensor in the shape [batch_size, seq_len, features] while you are trying to use a 2D tensor. Could you explain your use case a bit more and what result would be expected?

user3 · October 12, 2021, 8:26am

i want to take each 2 neurons at the end, and sum it up/ averge between them in the last layer.
meaning that if i had 20 neurons in almost final layer, ill have 10 neurons in the last layer

ptrblck · October 12, 2021, 9:19am

Assuming “between them” means neighboring features, a view and mean operation should work:

batch_size = 10
features = 20
x = torch.arange(batch_size*features).view(batch_size, features).float()

x = x.view(batch_size, -1, 2)
y = x.mean(dim=2)
print(y)