Cuda out of memory when setting "stride of 1"

I’m handling an architecture that uses ResNet50 as the backbone on Colab, when I try to train with stride of 1 on the first conv7x7 layer (There’s 3 input channels and 64 output channels) it raises CUDA out of memory, although the code runs pretty well for stride of 2 (the default behavior) for a batch size of 8. I tested until the batch size of 2 for a stride of 1, and I couldn’t run it.

I’m curious why stride would affect the CUDA usage, I’m totally unaware of what happens under the hood on GPU computation. But isn’t the memory usage independent of stride? Is there anything that I could do to run this layer with stride of 1?

Is there a count relation (batch_size, kernel_size, in_channels, out_channels, stride, input_resolution) with the loaded memory?

If the stride is decreased, then the spatial dimensions of the output (as well as those of stored intermediate activations during training) will increase and incur additional memory usage.

1 Like

I see, although I already computed higher input resolution there with a stride of 2, such the output feature map was a higher dimension than my current input dimension and didn’t get any problem, that’s why It seems strange to me.

If you can post a standalone example of these two cases (e.g., stride 2 with higher resolution that should use a comparable amount of memory and the failing stride 1 case), that could be useful in understanding if there is a bug causing unexpected memory usage.