Similar questions have been asked but I have not been able to solve the problem. I am trying to create a CNN over an image with one channel. I keep getting variations of a dimension error. Input Image is of size (99 * 99). Batch size 4.
Shape of input is (4 *99 * 99). When I tried to pass this into a CNN I obviously got an error because I was telling it there was only one channel but instead there are 99 l. So I did unsqueeze_(1) to get the shape of (4 * 1 * 99 * 99) which seemed correct. However I get this error
RuntimeError: Expected 3-dimensional tensor, but got 4-dimensional tensor for argument #1 ‘self’ (while checking arguments for max_pool1d)
Which I am not sure how to solve. I am using a nn.Conv2d layer as my first layer.
unsqueeze was necessary, as your channel dimension was missing.
Since you are using an image, you should also use
nn.MaxPool2d instead of
Here is a small example:
model = nn.Sequential(
nn.Conv2d(in_channels=1, out_channels=3, kernel_size=3, stride=1, padding=1),
x = torch.randn(4, 1, 99, 99)
output = model(x)
> torch.Size([4, 3, 49, 49])
Thank you, works perfectly! A side question, by any chance do you know any resources that would help me understand what cnn architecture to use and kernel size? I am new to the space and a lot of blogs and textbooks I have read recommend searching for a similar model and copying that architecture. I am using a cnn in non exactly conventional way so I can’t find anything similar.
It depends a bit on your image stats etc.
Usually a kernel size of
3 works quite good, as a lot of models use it (see vgg etc.), for a “natural” image of approx.
If you have a medical image (e.g. MRI) in a high resolution, the kernel size might not be the best.
A good resource is Stanford’s CS231n to get a feeling on convolutions.
im having a similar problam with avgpool1d:
this is code:
self.bn1 = nn.BatchNorm2d(64)
self.layer1 = self._make_layer(block, 64, num_blocks, stride=1)
self.layer2 = self._make_layer(block, 128, num_blocks, stride=2)
self.layer3 = self._make_layer(block, 256, num_blocks, stride=2)
self.layer4 = self._make_layer(block, 512, num_blocks, stride=2)
if (self.extraClasses == 1 or extraLayer == 1) and extraFeat == False:
self.linear = nn.Linear(512 * block.expansion, num_classes)
self.scores = nn.AvgPool1d(2, stride=2)
def forward(self, x):
out = F.relu(self.bn1(self.conv1(x)))
cnnOut = out
layer1 = self.layer1(out)
layer2 = self.layer2(layer1)
layer3 = self.layer3(layer2)
layer4 = self.layer4(layer3)
out = F.avg_pool2d(layer4, 4)
out = out.view(out.size(0), -1)
featureLayer = out
pool = 0
if self.extraClasses == 1:
out = self.linear(out)
pool = self.scores(out)
in the attempt of calculating the pooling, i get the following:
RuntimeError: Expected 3-dimensional tensor, but got 2-dimensional tensor for argument #1 ‘self’ (while checking arguments for avg_pool1d)
nn.AvgPool1d expects a 3D input tensor in the shape
[batch_size, seq_len, features] while you are trying to use a 2D tensor. Could you explain your use case a bit more and what result would be expected?
i want to take each 2 neurons at the end, and sum it up/ averge between them in the last layer.
meaning that if i had 20 neurons in almost final layer, ill have 10 neurons in the last layer
Assuming “between them” means neighboring features, a
mean operation should work:
batch_size = 10
features = 20
x = torch.arange(batch_size*features).view(batch_size, features).float()
x = x.view(batch_size, -1, 2)
y = x.mean(dim=2)