Inputting an image to a bunch of stacked ResNet blocks vs stacked Convolutional blocks

I am confused as to why we want to feed an image to a set of stacked resnet blocks versus feeding it to a bunch of convolutional blocks as we do so traditionally?

Also, how should we stack resnet blocks to each other? Is there any code sample or paper that does this?

I drew this diagram for illustration:

Wouldn’t this be the classical ResNet architecture as seen e.g. here?
Depending on the flavor of the ResNet you could be using Bottleneck or BasicBlock as part of the “ResNet Blocks”.

1 Like

I am actually not sure what is meant by “ResNet block” here.

If this model is taken from another research paper, I would expect a definition of the “ResNet Block”, but my best guess would be e.g.:

model = models.resnet18()
print(model.layer1)
> Sequential(
  (0): BasicBlock(
    (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu): ReLU(inplace=True)
    (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
  (1): BasicBlock(
    (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu): ReLU(inplace=True)
    (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
)
1 Like

Thanks a lot for your response. Just checking if is this your definition of “ResNet block” as well?

https://d2l.ai/chapter_convolutional-modern/resnet.html

The figure would represent the BasicBlock.

1 Like