Total estimated model params size is bigger than trainable params

Hello,

I have a problem with the model size. The description of my model size is as follows:

| Name | Type | Params
----------------------------------------
0 | model_ | Model | 44.5 M
*1 | loss_function | Loss | 0 *
----------------------------------------
44.5 M Trainable params
0 Non-trainable params
44.5 M Total params
178.099 Total estimated model params size (MB)

So I get this Runtime Error:

RuntimeError: CUDA out of memory. Tried to allocate 2.00 GiB

I am curious why Total estimated model params size is so much bigger than Trainable params. And what is the difference?
How can I reduce the Total params size to avoid this error?

I appreciate any help.

I don’t think the param size is too large, as it should be calculated as the number of elements in all parameters multiplied with their element size. Assuming you are using the default float32 dtype a quic estimation would be:

(44.5 * 1e6 * 4) / 1024**2
# 169.7540283203125

where the posted estimation uses / 1000**2 to calculate the size in MB:

(44.5 * 1e6 * 4) / 1000**2
# 178.0