ResNet, Bottleneck, Layers, groups, width_per_group

I was on the web looking for how to implement ResNet in Pytorch, and found this code, but I have couple of days looking for information on the web to explain how it works and I have not found anything, that is why I hopw to find information here of the parameters what are they and how they are used, or documentations.

import torch.nn as nn
import torchvision.models as models

class MyResNeXt(models.resnet.ResNet):
    def __init__(self, training=True):
        super(MyResNeXt, self).__init__(block=models.resnet.Bottleneck,
                                        layers=[3, 4, 6, 3], 
        self.fc = nn.Linear(2048, 1)
  • why do we init the ResNet with the Bottleneck, any docs?
    2.- layers [3, 4, 6, 3] this depends on the type of resnet? 18, 50, 101? etc
    3.- groups?
    4.- width per group, can we use different sizes? 8, 16, 32, etc?
    5.- self.fc = nn.Linear(2048, 1) is is binary, but why not 2 ? in Keras you can use 2 as the final output
  1. The sublocks of the resnet architecture can be defined as BasicBlock or Bottleneck based on the used resnet depth. E.g. resnet18 and resnet32 use BasicBlock, while resnet>=50 use Bottleneck.

  2. Yes. Your mentioned configuration would fit resnet34 and resnet50 as seen here.

  3. Bottleneck layers support the groups argument to create grouped convolutions. (line of code)

  4. Again, a ResNeXt-specific setup for the Bottleneck layer. You could try different values, but would most likely have to look into the paper, how these values would interact with the channels etc.

  5. You could treat your binary classification use case as a 2 class multi-class classification use case, if you set the number of output features to 2.

1 Like

@ptrblck, Wow!! this is great support, I am coming from the world of Keras and Tensorflow, as I believe that Pytorch is much better.

couple of questions:
1.- why allmost all the sample code I see in the web the do it like this:
resnet = models.resnet50(pretrained=True) and not like this other way? is there any advantage?

1.- how can you find tune this type of model
2.- can you concatenate models as you do in keras ?

Thanks a lot!

  1. Creating the resnet50 in a single line of code with pretrained weights is quite convenient instead of writing a custom class. If you don’t want to change e.g. the forward pass or any other modules, you could just stick to the torchvision.models.

  2. Have a look at the this or this tutorial for an introduction to finetuning the models.

  3. You can create the computation graph dynamically in any form you wish.
    E.g. if you want to feed the output of one model to another one, you can just write:

output = model1(x)
output = model2(output)
loss = criterion(output, target)

Autograd will make sure to create the gradients in both models as long as you haven’t detached a tensor from the computation graph (e.g. by using numpy methods or calling tensor.detach()).

Love you man!!! finally I am learning Pytorch!!!

1 Like

Hi @ptrblck, I am looking at this link here point # 2 but there is something I am missing,

model_ft = models.resnet18(pretrained=True)
num_ftrs = model_ft.fc.in_features
# Here the size of each output sample is set to 2.
# Alternatively, it can be generalized to nn.Linear(num_ftrs, len(class_names)).
model_ft.fc = nn.Linear(num_ftrs, 2)

Then you create another model…

model_conv = torchvision.models.resnet18(pretrained=True)
for param in model_conv.parameters():
    param.requires_grad = False

# Parameters of newly constructed modules have requires_grad=True by default
num_ftrs = model_conv.fc.in_features
model_conv.fc = nn.Linear(num_ftrs, 2)

they have different model names and in the first one, you use all the layers except the last linear one and set it to 2.

In the second piece of code, you freeze all layers and then the rest is basically the same.

BUT I don’t see any fine-tuning here… at least not from Keras way of doing it… where we use the same model and here are different models.

you mean we have to do this, or what is that I am missing

output = model_ft(x)
output = model2(model_conv)
loss = criterion(output, target)

Thanks a lot for your help!

Maybe I’m using the wrong terminology, but by fine tuning a model I mean to use a pretrained model, make some necessary changes for the new dataset (e.g. number of output units) and train this model using the new dataset.

The tutorial explains two different approaches, where the first one trains all parameters, while the latter one only trains the last output layer.

Passing the output of one model to another one is independent from your fine tuning use case, so you can ignore it for now.

@ptrblck ahh I understand now, it is completely different int keras, a lot easier in Pytorch hehe.

so when this person in the internet is declaring this way the resnet, he is using the finetune technique, right ?

class MyResNeXt(models.resnet.ResNet):
    def __init__(self, training=True):
        super(MyResNeXt, self).__init__(block=models.resnet.Bottleneck,
                                        layers=[3, 4, 6, 3], 
        self.fc = nn.Linear(2048, 1)

I was looking into the second link and was wondering how you go about instantiating is this correct? or where to find information of how to instantiate the ResNet 50 this way?

model_ft = models.ResNet50(block=models.resnet.Bottleneck,
                                        layers=[3, 4, 6, 3], 
        self.fc = nn.Linear(2048, 1)

Thanks again for your valuable and fast help!

can you point me where to learn how to concatenate models in Pytorch, or how you call this technique?

output = model1(x)
output = model2(output)

Not necessarily. I refer to “fine tuning” as using pretrained parameters and train the model on another dataset.
The author of the mentioned code snippet creates a new model called MyResNeXt by deriving from models.resnet.ResNet as the base model. This allows him to use all parent modules and might implementing his model easier.
The author doesn’t have to use the pretrained parameters of the base class, so it’s unrelated to a fine tuning task.

I would recommend to use the torchvision.models, if they fit your use case. Initializing the resnet “manually” as shown in your code snippet is needed, if you want to manipulate the model in a non-trivial way (e.g. change whole blocks inside the model etc.).

I might have used the wrong wording again, but there is not really anything special about this approach.
You are writing models in the same way by just passing the output of one layer to the next one:

class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 3, 1, 1)
        self.conv2 = nn.Conv2d(3, 6, 3, 1, 1)
        self.act = nn.ReLU()
    def forward(self, x):
        x = self.act(self.conv1(x))
        x = self.conv2(x)
        return x

In this example you are passing the output of the first convolution to the next one.
Since MyModel derives from nn.Module it can be treated as all other nn.Modules e.g. nn.Conv2d.

Hi @ptrblck, I really appreciate your help and time.

1 Like

Hi @ptrblck, sorry :innocent: need your help again, I am trying to learn how to implement custom class for resnet50 > models.

I am trying this but is giving me an error.

RuntimeError                              Traceback (most recent call last)
<ipython-input-31-44bf8238479a> in <module>
     13     checkpoint = torch.load("../models/resnext50_32x4d-7cdf4587.pth")
---> 15     model_ft = MyResNeXt().to(device) # models.resnet18(pretrained=True)
     17     del checkpoint
 RuntimeError: CUDA error: device-side assert triggered

here is the code

import torchvision.models as models

class MyResNeXt(models.resnet.ResNet):
    def __init__(self, training=True):
        super(MyResNeXt, self).__init__(block=models.resnet.Bottleneck,
                                        layers=[3, 4, 6, 3], 


        # Override the existing FC layer with a new one.
        self.fc = nn.Linear(2048, 1)

def freeze_until(net, param_name):
    found_name = False
    for name, params in net.named_parameters():
        if name == param_name:
            found_name = True
        params.requires_grad = found_name

# Finetuning the convnet
# ----------------------
# Load a pretrained model and reset final fully connected layer.
# loop over the number of models to train
for i in np.arange(0, 10):
    # initialize the optimizer and model
    print("[INFO] training model {}/{}".format(i + 1, 10))
    checkpoint = torch.load("../models/resnext50_32x4d-7cdf4587.pth")
    model_ft = MyResNeXt().to(device) # models.resnet18(pretrained=True)
    del checkpoint
    freeze_until(model_ft, "layer4.0.conv1.weight")

    criterion = nn.CrossEntropyLoss()

    # Observe that all parameters are being optimized
    optimizer_ft = torch.optim.Adam(model_ft.parameters(), lr=1e-5)

    # Decay LR by a factor of 0.1 every 7 epochs
    exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=7, gamma=0.1)
    # Train and evaluate
    model_ft = train_model(model_ft, criterion, optimizer_ft, exp_lr_scheduler, num_epochs=25)
    # save the model to disk
    p = ["models", "model_resnet18_adam{}.model".format(i)]
    checkpoint = { "optimizer": optimizer_ft.state_dict(),"model": model_ft.state_dict() }, p)

Thanks again for your help!!! I really have been tryingto find advance info in the web about pytorch, but nothing reall that goes beyond MNIST…

@ptrblck found the solution on the forum. you and the forum are of great help!

“Sir, I have got the error. It was because i had two classes and was using 1 at the output layer, which is used in other frameworks. But as I changed it to 2 my code is running. Similar thing happened in the age data where total number of classes were 104 but the actual age was from 1-116 and I was f…”

But I though that we need to use 1 for two class classification. and two was for more than 2.

Thanks again,

No. You can use a single output with e.g. nn.BCEWithLogitsLoss for a binary classification.
Alternatively you could also use two outputs with nn.CrossEntropyLoss for a “two class classification”, which would also classify each sample to one of two classes.

Note that the latter approach will double the output units and thus the last weight matrix.

Besides using another loss function, the target would also be different.
nn.BCEWithLogitsLoss expects the target as a FloatTensor with values in the range [0, 1], while nn.CrossEntropyLoss expects it to be a LongTensor with class indices in the range [0, num_classes-1].

Hi @ptrblck I see and understand now!!! THANKS!!!

Thanks for the explanation