Conv2d/Maxpool2d and Conv3d/Maxpool3d

jhoanmartinez · April 12, 2022, 2:12pm

In CIFAR 10 tutorial on pytorch(Training a Classifier — PyTorch Tutorials 1.11.0+cu102 documentation) why use Conv2d and Maxpool2d if images are in 3d shape?

import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):

def __init__(self):
    super().__init__()
    self.conv1 = nn.Conv2d(3, 6, 5)
    self.pool = nn.MaxPool2d(2, 2)
    self.conv2 = nn.Conv2d(6, 16, 5)
    self.fc1 = nn.Linear(16 * 5 * 5, 120)
    self.fc2 = nn.Linear(120, 84)
    self.fc3 = nn.Linear(84, 10)

def forward(self, x):
    x = self.pool(F.relu(self.conv1(x)))
    x = self.pool(F.relu(self.conv2(x)))
    x = torch.flatten(x, 1) # flatten all dimensions except batch
    x = F.relu(self.fc1(x))
    x = F.relu(self.fc2(x))
    x = self.fc3(x)
    return x

AlphaBetaGamma96 · April 12, 2022, 2:15pm

Images are 2D but color images have a corresponding ‘channel’ axis too. This is what the 3rd dimension in your Conv2d objects are. There’s more information in the documentation here.