Hi! I have a problem that one layer in my model takes up ca 6 GB of GPU RAM for forward pass, so I am unable to run batch sizes larger than 1 on my GPU.
Of course, I am not interested in running the model with batch_size 1 and looking how to improve.
I was thinking about “emulating” larger batch size. E.g., passing 10 single-example batches through the model and then call optimiser once (not after each batch-of-1). But how to do it? If I would do it manually, I would sum the gradients of 10 runs, divide by 10 and then use the result to update model parameters, right? But I am a bit clueless how to achieve the same thing using Pytorch built-in optimisers like SGD or Adam?
Or perhaps there is a better way to achieve the batch effect or even rework the layer for smaller memory footprint?
I am processing massive 3D data, hence the memory problem. My 3D convolution that causes high memory usage:
class My3DConv(nn.Module): def __init__(self): # input_size, output_size, kernel_size, stride, padding super().__init__() self.m1 = nn.Sequential(nn.Conv3d(128,64,3,stride=(2,1,1),padding=(1,1,1)), nn.BatchNorm3d(64), nn.ReLU()) self.m2 = nn.Sequential(nn.Conv3d(64,64,3,stride=(1,1,1),padding=(0,1,1)), nn.BatchNorm3d(64), nn.ReLU()) self.m3 = nn.Sequential(nn.Conv3d(64,64,3,stride=(2,1,1),padding=(1,1,1)), nn.BatchNorm3d(64), nn.ReLU()) def forward(self, data): data = self.m1(data) data = self.m2(data) data = self.m3(data) return data