@ptrblck Thanks for the reply. Just to follow up on that, I actually built a small ConvNet (down below) and have a few questions regarding padding, out_channels.
Basic ConvNet is:
class OurConvNet(nn.Module):
def __init__(self):
super().__init__()
self.projection = None
# sig dims expected: HxWxC = 36x36x1
# dep dims expected: HxWxC = 36x36x1
self.conv1 = nn.Conv2d(in_channels=1, out_channels=32, kernel_size=3, stride=1, padding=1)
self.conv2 = nn.Conv2d(in_channels=32, out_channels=31, kernel_size=1, stride=1, padding=0)
# check forward function where sig, dep are concatenated to get 32 out_channels and hence
# the in_channels of self.conv2 layer is 32. But, is it right?
self.conv3 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, stride=1, padding=0)
self.conv4 = nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3, stride=1, padding=0)
def forward(self, sig, dep):
# Apply convolution
x = self.conv1(sig)
print(f'conv1: {x.size()}')
# Apply tanh activation
x = torch.tanh(x)
x = self.conv2(x)
# concat => is it correct?
concat = torch.cat((x, dep), dim=1)
print(f'concat: {concat.shape}')
# Apply convolution
x = self.conv3(concat)
print(f'conv2: {x.size()}')
# Apply Convolution 3
x = self.conv4(x)
print(f'conv3: {x.size()}')
x = nn.ReLU(x)
return x
sig = torch.randn(1, 1, 36, 36)
dep = torch.randn(1, 1, 36, 36)
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
net = OurConvNet()
y = net(sig, dep)
The net output is:
OurConvNet(
(conv1): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(conv2): Conv2d(32, 31, kernel_size=(1, 1), stride=(1, 1))
(conv3): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1))
(conv4): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1))
)
As one can notice, in the self.conv2
I have 31 out channels and then another array gets concatenated in the forward method to make it 32 which then becomes the input of self.conv3
.
My questions however are:
- Will using odd number (31) output_channels be a problem? I have always come across the output channels that are a power of 2, i.e. 32, 64, 128, 256, etc.
- Similarly, can I use non-square padding, i.e.
padding=(3,0
. As I had mentioned in my question, I did some hand calculations to modify resnet18 architecture and had to use non-square padding to get to the (44, 120) shape.
- I can use the
summary
from torchsummary
when there is only one input. But how do I get the summary of the above output?
Thanks for bearing with me.