Simple CNN model takes too much memory when forward() is called

Hello,
In the following code, I define a simple CNN model composed of:
CNN(256 out-channels, 5x5 kernel) → ReLU → Flatten()

when I run the model with a sample of input data, torch.cuda.memory_allocated() shows 1.8Gb allocated, is that normal? if not why is it so high? what am I doing wrong?

the code:
(PS cfg is a config file)

import cfg
import torch
from torch import nn
import gc

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.layer1 = nn.LazyConv2d(out_channels=256, kernel_size=5)
        self.layer2 = nn.Flatten()
        self.model = nn.Sequential(self.layer1, nn.ReLU() ,self.layer2)

    def forward(self, X):

        print("Before model call")
        print("torch.cuda.memory_allocated: %fGB"%(torch.cuda.memory_allocated(0)/1024/1024/1024))

        out = self.model(X)

        print("\n\nAfter model call")
        print("torch.cuda.memory_allocated: %fGB"%(torch.cuda.memory_allocated(0)/1024/1024/1024))
        

        return out


model = Model()


model.to(cfg.DEVICE)


gc.collect()
torch.cuda.empty_cache()

X = torch.randn(size=(8, 1, 500, 500)).to(cfg.DEVICE)


output = model(X)

the output:

Before model call
torch.cuda.memory_allocated: 0.007451GB


After model call
torch.cuda.memory_allocated: 1.884429GB

The memory usage is expected as already the output tensor consumes almost all of the used memory:

print(output.nelement() * output.element_size() / 1024**3)
# 1.876953125