Untrained feature extractor with batch norm layers active

I was wondering how can I keep my batch norm layers active using an untrained feature extraction network.

Would this be considered feature extraction with an “untrained” network?:

class DenseNetConv(torch.nn.Module):
    def __init__(self):
        super(DenseNetConv,self).__init__()
        original_model = models.densenet161(pretrained=False)
        self.features = torch.nn.Sequential(*list(original_model.children())[:-1])
        self.avgpool = nn.AdaptiveAvgPool2d(1)
        for param in self.parameters():
            param.requires_grad = False

    def forward(self, x):
        x = self.features(x)
        x = F.relu(x, inplace=True)
        x = F.avg_pool2d(x, kernel_size=7).view(x.size(0), -1)
        return x

The above should return a tensor of [batch size, 2208], however, I want to make sure that by stating pretrained=False that I am basically extracting features from an untrained network.

I then use the following to define the classifier layers:

class MyDenseNetDens(torch.nn.Module):
    def __init__(self, nb_out=2):
        super().__init__()
        self.dens1 = torch.nn.Linear(in_features=2208, out_features=512)
        self.dens2 = torch.nn.Linear(in_features=512, out_features=128)
        self.dens3 = torch.nn.Linear(in_features=128, out_features=nb_out)
        
    def forward(self, x):
        x = self.dens1(x)
        x = torch.nn.functional.selu(x)
        x = F.dropout(x, p=0.25, training=self.training)
        x = self.dens2(x)
        x = torch.nn.functional.selu(x)
        x = F.dropout(x, p=0.25, training=self.training)
        x = self.dens3(x)
        return x

and finally join them together here:

class MyDenseNet(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.mrnc = MyDenseNetConv()
        self.mrnd = MyDenseNetDens()
    def forward(self, x):
        x = self.mrnc(x)
        x = self.mrnd(x)
        return x 

densenet = MyDenseNet()
densenet.to(device)
densenet.train()

If I allow this to train, for example by applying densenet.train() will this be sufficient in allowing batch normalisation statistics to be generated for each mini-batch as well as allow for the running means and standard deviations to be learnt and applied during inference, while keeping the convolutional layers untrained?

Yes, I guess you could call DenseNetConv an untrained feature extractor.

Yes. You’ve frozen all trainable parameters from the feature extractor by setting their requires_grad attribute to False, while the linear layers in MyDenseNetDens should still be trainable since no attributes were changed. Calling .train() on the model will normalize the forward activations using the batch stats and will update the running stats.

1 Like