`BatchNorm1d()` with batchsize=1

Q1: How does BatchNorm1d() judge the current forward() is training or inference? Is there some parameters can be observed and setted manually?
Q2: Specifically speaking, I’m trying to implement a reinforcement learning task. While in the training process, the transition tuple <state, action, state_next, reward> has to be generated one by one calling forward(), the state is input and the reward and the action are outputs, and then is add to the experience reply memory for the subsequent minibatch optimization, which means in the process of generating the transition tuple, the batch size of the input of forward() is 1. Hence, the BatchNorm1d() can not work because the standard deviation is 0. How to deal with the problem effectively? My existing simple idea is to randomly generate some transition tuple not using forward() for initialization. Is there some effectively and professional methods?

  1. The internal .training attribute determines the behavior of some layers, e.g. batch norm layers.
    If you call model.train() or model.eval() (or model.bn_layer.train()), this internal flag will be switched.

  2. If you are using a single sample as your batch, you might consider using other normalization lazers, e.g. InstanceNorm.

Hi there,

I run on multiple GPUs with random varying batch sizes.
I found that sometimes the last batch of the epoch is of size 1 and than BatchNorm1D throws an error and stops my run.

I solved locally with try<->except but it seems like a bug to me…

1 Like

To remove the last, potentially smaller, batch from an epoch, you could specify drop_last = True in your DataLoader.


How can I use InstanceNorm layer with samples (batch size = 1)?

When I’m trying to do something like this:

x = torch.randn([1, 512])
m = nn.InstanceNorm1d(512)

I’m getting error like this:

InstanceNorm1d returns 0-filled tensor to 2D tensor.This is because InstanceNorm1d reshapes inputs to(1, N * C, ...) from (N, C,...) and this makesvariances 0.

Did I misunderstand something?


nn.InstanceNorm1d will calculate the statistics for each sample in the batch separately.
While this might be an advantage over batchnorm layers for small batch sizes (unsure if InstanceNorm would perform better than e.g. GroupNorm), the statistics would still need more than a single scalar in the temporal dimension.

@ptrblck I know this might be a strange question but its related. I want to remember the previously used batch statistics (without using a running average) then evaluate on 1. Is that possible or is the only way to create my own batch norm layer?

see: How to use have batch norm not forget batch statistics it just used?

In the model I am training, I can only submit batch sizes of one through the model during training due to memory constraints. Each of my input tensors are (1, 1, 192-65536). I could normalize to (1, 1, 65536) so that I could do batching, but that means that in my Conv1d(512, 768, 3) layer, that the memory required to house all of those dimensions with a batch size of 128 would be 128 * 65536 * 4 (bytes per float) * 768 (channels) = 24GB of RAM, but my GTX 1080 only has 8GB of RAM.

Bottom line, I can’t push more than one sample per batch. I currently call loss.backward() on each loss for 128 samples which aggregates the gradients. I then call optimizer.step() after all 128 samples have been processed.

I want to include batch normalization. Is there someway to use BatchNorm1d in this scenario, or would it be possible to implement BatchNorm1d with just the gradients produced by calling loss.backward() 128 times?