I am currently working 3D deep learning with 3D convolutions on grids. Now, I encountered the problem that if the 3D grid size is above (110, 110, 110) the number of allocated tensors on the GPU starts to grow when calling the loss.backward() causing a GPU memory leak.
If I return x1 after kernel_3, then the problem does not happen. If I return x1 after kernel_4 or later the problem happens. All kernels are 3D convolutions with 1x1x1 kernels.
Are you passing an input of [batch_size, channels, 110, 110, 110] to nn.Conv3d layers or do I misunderstand your question regarding the grid?
Could you post the model definition so that we can reproduce this issue, please?
Yes, exactly. So the problem of new tensors being allocated starts when increasing the input size from [1, 16, 110, 110, 110] to e.g. [1, 16, 115, 115, 115].