Hi there !

I am facing a strange behaviour from cuda when using a handmade model. I have been using variants of resnet for some time now, but as I am overfitting I decided to create a smaller model to see how it goes.

The thing is, when I am using resnet18 (which has around 11M parameters) it uses around 3GB of cuda memory (witch batches of 64 images of size 224*224). However, when I use my custom model that has around 4M parameters, it uses almost 10GB of cuda memory. My model is the followiong:

```
GradesClassifModel(
(base_model): Sequential(
(0): Sequential(
(0): Sequential(
(0): ConvBnRelu(
(conv): Conv2d(3, 64, kernel_size=(5, 5), stride=(1, 1))
(bn): BatchNorm2d(64, eps=1e-05, momentum=0.01, affine=True, track_running_stats=True)
(relu): ReLU()
)
(1): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(1): Sequential(
(0): ConvBnRelu(
(conv): Conv2d(64, 128, kernel_size=(5, 5), stride=(1, 1))
(bn): BatchNorm2d(128, eps=1e-05, momentum=0.01, affine=True, track_running_stats=True)
(relu): ReLU()
)
(1): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(2): Sequential(
(0): ConvBnRelu(
(conv): Conv2d(128, 256, kernel_size=(5, 5), stride=(1, 1))
(bn): BatchNorm2d(256, eps=1e-05, momentum=0.01, affine=True, track_running_stats=True)
(relu): ReLU()
)
(1): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(3): Sequential(
(0): ConvBnRelu(
(conv): Conv2d(256, 512, kernel_size=(5, 5), stride=(1, 1))
(bn): BatchNorm2d(512, eps=1e-05, momentum=0.01, affine=True, track_running_stats=True)
(relu): ReLU()
)
(1): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
)
)
)
(head): Sequential(
(0): AdaptiveAvgPool2d(output_size=1)
(1): Flatten()
(2): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): Dropout(p=0.25, inplace=False)
(4): Linear(in_features=512, out_features=512, bias=True)
(5): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(6): Dropout(p=0.5, inplace=False)
(7): Linear(in_features=512, out_features=2, bias=True)
)
)
```

Does anyone have an idea why this occurs ? I have no clue if this is normal due to some obscure autograd behaviour or if I did something wrong.