Hi member,
I am trying to translate a Resnet modified model architecture written on Mxnet to Pytorch. Below is the architecture:
SegmentationNetwork(
(cnn): HybridSequential(
(0): Conv2D(1 → 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=64)
(2): Activation(relu)
(3): MaxPool2D(size=(3, 3), stride=(2, 2), padding=(1, 1), ceil_mode=False, global_pool=False, pool_type=max, layout=NCHW)
(4): HybridSequential(
(0): BasicBlockV1(
(body): HybridSequential(
(0): Conv2D(64 → 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=64)
(2): Activation(relu)
(3): Conv2D(64 → 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=64)
)
)
(1): BasicBlockV1(
(body): HybridSequential(
(0): Conv2D(64 → 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=64)
(2): Activation(relu)
(3): Conv2D(64 → 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=64)
)
)
(2): BasicBlockV1(
(body): HybridSequential(
(0): Conv2D(64 → 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=64)
(2): Activation(relu)
(3): Conv2D(64 → 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=64)
)
)
)
(5): HybridSequential(
(0): BasicBlockV1(
(body): HybridSequential(
(0): Conv2D(64 → 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=128)
(2): Activation(relu)
(3): Conv2D(128 → 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=128)
)
(downsample): HybridSequential(
(0): Conv2D(64 → 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=128)
)
)
(1): BasicBlockV1(
(body): HybridSequential(
(0): Conv2D(128 → 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=128)
(2): Activation(relu)
(3): Conv2D(128 → 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=128)
)
)
(2): BasicBlockV1(
(body): HybridSequential(
(0): Conv2D(128 → 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=128)
(2): Activation(relu)
(3): Conv2D(128 → 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=128)
)
)
(3): BasicBlockV1(
(body): HybridSequential(
(0): Conv2D(128 → 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=128)
(2): Activation(relu)
(3): Conv2D(128 → 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=128)
)
)
)
(6): HybridSequential(
(0): Flatten
(1): Dense(None → 64, Activation(relu))
(2): Dropout(p = 0.5, axes=())
(3): Dense(None → 64, Activation(relu))
(4): Dropout(p = 0.5, axes=())
(5): Dense(None → 4, Activation(sigmoid))
)
)
)
I can understand the architecture but the last block quite confusing.
(6): HybridSequential(
(0): Flatten
(1): Dense(None → 64, Activation(relu))
(2): Dropout(p = 0.5, axes=())
(3): Dense(None → 64, Activation(relu))
(4): Dropout(p = 0.5, axes=())
(5): Dense(None → 4, Activation(sigmoid))
)
How to translate None as an input to Pytorch nn.Linear? It is the input dimension as I know but how come input has None dimension? The other one is the downsampling part:
(downsample): HybridSequential(
(0): Conv2D(64 → 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=128)
)
The previous block is
(body): HybridSequential(
(0): Conv2D(64 → 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=128)
(2): Activation(relu)
(3): Conv2D(128 → 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=128)
)
How come the input to the downsampling block is not same as the output from the previous block?
and below is the scrrpt used to produce the last block:
output.add(gluon.nn.Flatten())
output.add(gluon.nn.Dense(64, activation=‘relu’))
output.add(gluon.nn.Dropout(p_dropout))
output.add(gluon.nn.Dense(64, activation=‘relu’))
output.add(gluon.nn.Dropout(p_dropout))
output.add(gluon.nn.Dense(4, activation=‘sigmoid’))
Please advise