BatchNorm1d reimplementation needed - Expected more than 1 value per channel when training, got input size torch.Size([1, 1])

Hi, I have this problem with the following Module (this is from this implementation):

class TargetNet(nn.Module):
    def __init__(self):
        super(TargetNet, self).__init__()

        # L2
        self.fc1 = nn.Linear(365, 100)
        for i, m_name in enumerate(self._modules):
            if i > 2:
                nn.init.kaiming_normal_(self._modules[m_name].weight.data)
        self.bn1 = nn.BatchNorm1d(100).to(device) #.cuda()
        self.relu1 = nn.PReLU()
        self.drop1 = nn.Dropout(1 - 0.5)

        self.relu7 = nn.PReLU()
        self.relu7.to(device) #.cuda()
        self.sig = nn.Sigmoid()

    def forward(self, x, paras):
        q = self.fc1(x)
        q = self.bn1(q)
        q = self.relu1(q)
        q = self.drop1(q)

        self.lin = nn.Sequential(TargetFC(paras['res_last_out_w'], paras['res_last_out_b']))
        q = self.lin(q)
        bn7 = nn.BatchNorm1d(q.shape[0])
        bn7.to(device) # .cuda()
        q = bn7(q)
        q = self.relu7(q)

        return q

When running the network on a single image I get:

ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 1])

Although I am putting the network into eval mode, the bn7 BatchNorm1d still stays in training mode, hence the issue. When I put the bn7 directly into eval mode the inference works but the results are totally off. Since the BatchNorm is initialized in the forward pass, I guess it uses the default parameters, since they are not trained. How could I reimplement this part to make this work on a single image forward pass?

The reason why I try to do this is on a single image without DataLoader is that I want to convert the model to ONNX, but there I need to pass through a dummy image, which in this case is not possible.

This is expected since you are not creating bn7 in the __init__ method and are thus not properly registering this module. Each forward pass will create a new bn7 = nn.BatchNorm1d(q.shape[0]) layer in training mode and will apply it the q. Note that no training will be performed on this layer since the next forward pass will re-create a new layer.

Yes, exactly, but since the weights are provided and I can’t retrain the network I would like to use this (not clean) implementation Hence I would like to implement BatchNorm1D in such a way that would work here - thus for batch size 1. I assume that the trainable parameters (which are obviously not trained here) are the default values when a BatchNorm1D is initialized,

You could call bn7.train(mode=self.training), but note that a newly initialized bn layer in eval mode won’t do anything. It will just subtract zeros and divide by ones, so it’s a waste in compute.

Thank you, I just tried that and the inference works with single image, but for some reason the result is still the same as when I put it into eval mode. This is how I tried:

bn7 = nn.BatchNorm1d(q.shape[0])
bn7.train(mode=self.training)
q = bn7(q)

Yes, the batchnorm layer won’t change its inputs since it wasn’t trained at all and since it’s in eval mode. Or are you referring to the error when describing “the result is still the same”?

I am trying to say that the output of the model I get with your method is the same result as when I put it into eval mode (the whole model and the bn7 layer) instead.

As I establieshed in the beginning the ValueError occurs when my model is in eval mode and I dont do anything to the bn7 layer, which makes sense since it is in training mode.

It also doesn’t make sense put the bn7 layer into eval mode, as you also mentioned.

I would like to make the inference through bn7 with the values it has while training, because this is what the authors intend to do here I guess. Hence why I thought about implementing the functionality of the BatchNorm1D by myself, as how it would be in training mode, so I can run it when doing evaluation and not run into the initial valueerror.

The batchnorm1d seems to work just fine leaving it in training mode just when we use batch_size >2, which makes sense. BatchNorm1D over batch_size of 1 would be pointless.