Very high forward/backward pass size

I have implemented a Pixelcnn model for 3 dimension images (volumetric) and the architecture is as follows for input size of 14x14x14:

---------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv3d-1        [-1, 2, 14, 14, 14]              36
            Conv3d-2        [-1, 2, 14, 14, 14]               4
            Conv3d-3        [-1, 2, 14, 14, 14]              12
            Conv3d-4        [-1, 2, 14, 14, 14]               4
            Conv3d-5        [-1, 2, 14, 14, 14]               4
    MaskedConv3d_h-6        [-1, 2, 14, 14, 14]               4
        Activation-7        [-1, 2, 14, 14, 14]               0
        Activation-8        [-1, 2, 14, 14, 14]               0
        Activation-9        [-1, 2, 14, 14, 14]               0
           Conv3d-10        [-1, 2, 14, 14, 14]               4
           Conv3d-11        [-1, 2, 15, 14, 14]              72
           Conv3d-12        [-1, 2, 14, 15, 14]              24
           Conv3d-13        [-1, 2, 14, 14, 14]               4
           Conv3d-14        [-1, 2, 14, 14, 14]               4
           Conv3d-15        [-1, 2, 14, 14, 14]               4
           Conv3d-16        [-1, 2, 14, 14, 15]               8
       Activation-17        [-1, 2, 14, 14, 14]               0
       Activation-18        [-1, 2, 14, 14, 14]               0
       Activation-19        [-1, 2, 14, 14, 14]               0
           Conv3d-20        [-1, 2, 14, 14, 14]               4
StackedConvolution-21  [[-1, 2, 14, 14, 14], [-1, 2, 14, 14, 14], [-1, 2, 14, 14, 14]]               0
           Conv3d-22        [-1, 2, 15, 14, 14]              72
           Conv3d-23        [-1, 2, 14, 15, 14]              24
           Conv3d-24        [-1, 2, 14, 14, 14]               4
           Conv3d-25        [-1, 2, 14, 14, 14]               4
           Conv3d-26        [-1, 2, 14, 14, 14]               4
           Conv3d-27        [-1, 2, 14, 14, 15]               8
       Activation-28        [-1, 2, 14, 14, 14]               0
       Activation-29        [-1, 2, 14, 14, 14]               0
       Activation-30        [-1, 2, 14, 14, 14]               0
           Conv3d-31        [-1, 2, 14, 14, 14]               4
StackedConvolution-32  [[-1, 2, 14, 14, 14], [-1, 2, 14, 14, 14], [-1, 2, 14, 14, 14]]               0
           Conv3d-33        [-1, 2, 15, 14, 14]              72
           Conv3d-34        [-1, 2, 14, 15, 14]              24
           Conv3d-35        [-1, 2, 14, 14, 14]               4
           Conv3d-36        [-1, 2, 14, 14, 14]               4
           Conv3d-37        [-1, 2, 14, 14, 14]               4
           Conv3d-38        [-1, 2, 14, 14, 15]               8
       Activation-39        [-1, 2, 14, 14, 14]               0
       Activation-40        [-1, 2, 14, 14, 14]               0
       Activation-41        [-1, 2, 14, 14, 14]               0
           Conv3d-42        [-1, 2, 14, 14, 14]               4
StackedConvolution-43  [[-1, 2, 14, 14, 14], [-1, 2, 14, 14, 14], [-1, 2, 14, 14, 14]]               0
           Conv3d-44        [-1, 2, 14, 14, 14]               6
      BatchNorm3d-45        [-1, 2, 14, 14, 14]               4
       Activation-46        [-1, 2, 14, 14, 14]               0
          Dropout-47        [-1, 2, 14, 14, 14]               0
           Conv3d-48        [-1, 3, 14, 14, 14]               6
================================================================
Total params: 444
Trainable params: 444
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.02
Forward/backward pass size (MB): 14832.59
Params size (MB): 0.00
Estimated Total Size (MB): 14832.61
----------------------------------------------------------------

What is really confusing for me is the forward/backwad pass size. the model trains well though for a small input like this, but I wonder if 14832 MB make sense at all? I have 444 trainable parameters only…

Besides the parameters, the forward activations will also be stored during training in order to compute the gradients during the backward pass.
This post explains it in more detail with an example.
Based on the posted summary, I wouldn’t expect to see ~15GB of memory usage, but I also don’t know how the summary is calculated and how the forward activation memory usage is computed.