The convolutional layers (e.g. nn.Conv2d) require groups
to divide both in_channels
and out_channels
. The functional convolutions (e.g. nn.functional.conv2d) only require groups
to divide in_channels
.
This leads to confusing behavior:
import numpy as np
import torch
from torch import nn
from torch.nn import functional as F
# testing
batch_size = 1
w_img = 1
h_img = 1
c_in = 6
c_out = 9
filter_len = 1
groups = 3
image = np.arange(6, dtype=np.float32).reshape(batch_size, c_in, h_img, w_img)
filters = np.empty((c_out, c_in // groups, filter_len, filter_len), dtype=np.float32)
filters.fill(0.5)
image = torch.tensor(image)
filters = torch.tensor(filters)
features_functional = F.conv2d(image, filters, padding=filter_len // 2, groups=groups)
print(features_functional.shape[1]) # 9
layer = nn.Conv2d(c_in, c_out, filter_len, padding=filter_len // 2, groups=groups)
print(layer.out_channels) # 9
Here both forms have 9 out_channels
. Changing groups
to 2, however results in 8 out_channels
from the functional form and an exception thrown from the other.
I see two problems with this:
- Inconsistency (despite the fact that both are documented correctly).
- The functional form opaquely rounds
out_channels
down to the nearest integer that is divisible bygroups
. This is a non-obvious process for the user.
Is there a reason for this difference? If not, it seems like the functional form should throw a similar error.