Understanding Related to Architecture

savvy · February 16, 2023, 12:34am

I am trying to understand the architechture of a pretrained model (deeplabv3) trained on mxnet and am trying to implement the same on pytorch.

this is the architechture i have when I use print(model)

(a lot of architecture which is similar)
(head): _DeepLabHead(
    (aspp): _ASPP(
      (concurent): HybridConcurrent(
        (0): HybridSequential(
          (0): Conv2D(2048 -> 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=256)
          (2): Activation(relu)
        )
        (1): HybridSequential(
          (0): Conv2D(2048 -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(12, 12), dilation=(12, 12), bias=False)
          (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=256)
          (2): Activation(relu)
        )
        (2): HybridSequential(
          (0): Conv2D(2048 -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(24, 24), dilation=(24, 24), bias=False)
          (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=256)
          (2): Activation(relu)
        )
        (3): HybridSequential(
          (0): Conv2D(2048 -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(36, 36), dilation=(36, 36), bias=False)
          (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=256)
          (2): Activation(relu)
        )
        (4): _AsppPooling(
          (gap): HybridSequential(
            (0): GlobalAvgPool2D(size=(1, 1), stride=(1, 1), padding=(0, 0), ceil_mode=True, global_pool=True, pool_type=avg, layout=NCHW)
            (1): Conv2D(2048 -> 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (2): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=256)
            (3): Activation(relu)
          )
        )
      )
      (project): HybridSequential(
        (0): Conv2D(1280 -> 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=256)
        (2): Activation(relu)
        (3): Dropout(p = 0.5, axes=())
      )
    )
    (block): HybridSequential(
      (0): Conv2D(256 -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=256)
      (2): Activation(relu)
      (3): Dropout(p = 0.1, axes=())
      (4): Conv2D(256 -> 150, kernel_size=(1, 1), stride=(1, 1))
    )

// // MY QUESTION IS WITH RESPECT TO THE LINES BEFORE AND AFTER THIS COMMENT

  )
  (auxlayer): _FCNHead(
    (block): HybridSequential(
      (0): Conv2D(1024 -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=256)
      (2): Activation(relu)
      (3): Dropout(p = 0.1, axes=())
      (4): Conv2D(256 -> 150, kernel_size=(1, 1), stride=(1, 1))
    )
  )
)

Please find the (//) Comment in the code.
I am able to create the model up until before the comment. I am a little confused as to why and how from 150 channels I must create a conv2D with 1024 channels as input (_FCNHead?): The above architecture chart is produced by mxnet and not pytorch.

Pytorch one is below (my version)

  (project): Sequential(
          (0): Conv2d(1280, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU()
          (3): Dropout(p=0.5, inplace=False)
        )
      )
      (1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (3): ReLU()
      (4): Conv2d(256, 150, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    )
  )
  (BNorm2d): BatchNorm2d(150, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (Dropout): Dropout(p=0.1, inplace=False)
  (conv1): Conv2d(256, 150, kernel_size=(1, 1), stride=(1, 1))
)

What do I do after the 150 channel part? any idea with insights will really help me break the barrier of understanding. Thank you!

ptrblck · February 16, 2023, 7:58am

I assume the model is not a pure sequential container, but uses some functional API calls which might not be shown in the print statement.
E.g. here it seems that MXNet is resizing the activation before passing it to the auxlayer. You would have to recreate the actual forward method in PyTorch, too.

savvy · February 16, 2023, 3:44pm

Thank you for the reply.

So I need to upsample and bring my model from 150 channels back to 1024 channels (using Bilinear-Interpolation[as they did] or Transposed Convolutions) and then follow up with

(1024,256) —> (256, 150), right? (keeping the resolution of the multi-channel image as per my requirements)

Because as far as I can see the _FCNBlock or the HybridSequential or the HybridBlock or the Block (inherited on after the other) aren’t really doing anything. Unless I am wrong. Any guidance is highly appreciated :))

ptrblck · February 16, 2023, 6:32pm

I’m unsure how the MXNet model is working exactly, but you should check the shape of the intermediate activation tensors and make sure the actual forward pass is equal between the frameworks.