I searched on the forum, but couldn’t find any topic with exactly same problem described below.
Currently, I do some research on computer vision deep learning topic.
I noticed, that in “Conv2d”, there is an option to specify a “groups” value.
According to PyTorch docs:
At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels, and producing half the output channels, and both subsequently concatenated.
If I’m right, groups can also be used to split the load on many GPUs, if groups=2, split on 2 GPUs (original AlexNet was trained that way on many less powerfull GPUs)
My question is, what happens, if I specify groups=2, but have only a single GPU? Is the half of the channels from the output of the previous layer dropped in case of single GPU, and information is lost? Or does pyTorch concatenates the two groups and feeds to the next layer as if there weren’t any groups (or groups=1) ?
I train my network currently on Google Colab with GPU, and as far as I know, colab assigns just one GPU to the session.
Excerpt from models state dict (conv2, conv4, conv5 with groups=2):
conv.conv1_s1.weight torch.Size([96, 3, 25, 7])
conv.conv1_s1.bias torch.Size([96])
conv.conv2_s1.weight torch.Size([192, 48, 3, 3])
conv.conv2_s1.bias torch.Size([192])
conv.conv3_s1.weight torch.Size([256, 192, 3, 3])
conv.conv3_s1.bias torch.Size([256])
conv.conv4_s1.weight torch.Size([256, 128, 3, 3])
conv.conv4_s1.bias torch.Size([256])
conv.conv5_s1.weight torch.Size([192, 128, 3, 3])
conv.conv5_s1.bias torch.Size([192])
fc6.fc6_s1.weight torch.Size([512, 6336])
fc6.fc6_s1.bias torch.Size([512])
fc7.fc7.weight torch.Size([256, 2048])
fc7.fc7.bias torch.Size([256])
classifier.fc8.weight torch.Size([24, 256])
classifier.fc8.bias torch.Size([24])
Total parameters: 4868056
Many Thanks