How to mimic larger batch size when only 1 example fits into memory

Hi! I have a problem that one layer in my model takes up ca 6 GB of GPU RAM for forward pass, so I am unable to run batch sizes larger than 1 on my GPU.

Of course, I am not interested in running the model with batch_size 1 and looking how to improve.

I was thinking about “emulating” larger batch size. E.g., passing 10 single-example batches through the model and then call optimiser once (not after each batch-of-1). But how to do it? If I would do it manually, I would sum the gradients of 10 runs, divide by 10 and then use the result to update model parameters, right? But I am a bit clueless how to achieve the same thing using Pytorch built-in optimisers like SGD or Adam?

Or perhaps there is a better way to achieve the batch effect or even rework the layer for smaller memory footprint?

I am processing massive 3D data, hence the memory problem. My 3D convolution that causes high memory usage:

class My3DConv(nn.Module):

  def __init__(self):
    # input_size, output_size, kernel_size, stride, padding
    super().__init__()
    self.m1 = nn.Sequential(nn.Conv3d(128,64,3,stride=(2,1,1),padding=(1,1,1)), nn.BatchNorm3d(64), nn.ReLU())
    self.m2 = nn.Sequential(nn.Conv3d(64,64,3,stride=(1,1,1),padding=(0,1,1)), nn.BatchNorm3d(64), nn.ReLU())
    self.m3 = nn.Sequential(nn.Conv3d(64,64,3,stride=(2,1,1),padding=(1,1,1)), nn.BatchNorm3d(64), nn.ReLU())

  def forward(self, data):
    data = self.m1(data)
    data = self.m2(data)
    data = self.m3(data)
    return data

This post gives you some examples with advantages and shortcomings. :slight_smile:

1 Like