Input size(mb) of a 6 x 224 x 224 tensor shows 84GB

Hi there,

Good days everyone, I am trying on a network with input shape of 6x224x224 tensor, however, the torch.summary() report that the input size is around 86436.00 MB which is 84GB , so that it is killed.
I have google around on how is the input size calculated but it didn’t match with the output shows.
Is there any thing wrong in my code ?
Thanks ~~

Dick

Here is the output of torch.summary()


    Layer (type)               Output Shape         Param #

================================================================
Conv2d-1 [-1, 64, 224, 224] 3,520
ReLU-2 [-1, 64, 224, 224] 0
Conv2d-3 [-1, 64, 224, 224] 36,928
ReLU-4 [-1, 64, 224, 224] 0
MaxPool2d-5 [-1, 64, 112, 112] 0
Conv2d-6 [-1, 128, 112, 112] 73,856
ReLU-7 [-1, 128, 112, 112] 0
Conv2d-8 [-1, 128, 112, 112] 147,584
ReLU-9 [-1, 128, 112, 112] 0
MaxPool2d-10 [-1, 128, 56, 56] 0
Conv2d-11 [-1, 256, 56, 56] 295,168
ReLU-12 [-1, 256, 56, 56] 0
Conv2d-13 [-1, 256, 56, 56] 590,080
ReLU-14 [-1, 256, 56, 56] 0
MaxPool2d-15 [-1, 256, 28, 28] 0
Conv2d-16 [-1, 512, 28, 28] 1,180,160
ReLU-17 [-1, 512, 28, 28] 0
Conv2d-18 [-1, 512, 28, 28] 2,359,808
ReLU-19 [-1, 512, 28, 28] 0
Conv2d-20 [-1, 512, 28, 28] 2,359,808
ReLU-21 [-1, 512, 28, 28] 0
MaxPool2d-22 [-1, 512, 14, 14] 0
Conv2d-23 [-1, 512, 14, 14] 2,359,808
ReLU-24 [-1, 512, 14, 14] 0
Conv2d-25 [-1, 512, 14, 14] 2,359,808
ReLU-26 [-1, 512, 14, 14] 0
Conv2d-27 [-1, 512, 14, 14] 2,359,808
ReLU-28 [-1, 512, 14, 14] 0
MaxPool2d-29 [-1, 512, 7, 7] 0
Linear-30 [-1, 4096] 102,764,544
ReLU-31 [-1, 4096] 0
Linear-32 [-1, 4096] 16,781,312
ReLU-33 [-1, 4096] 0
Linear-34 [-1, 1] 4,097

Total params: 133,676,289
Trainable params: 133,676,289
Non-trainable params: 0

Input size (MB): 86436.00
Forward/backward pass size (MB): 206.27
Params size (MB): 509.93
Estimated Total Size (MB): 87152.20

Here is the forward code, I have use cat to stack up 2 image to 6 channel
```
def forward(self,leftimage,rightimage):
combine=torch.cat((leftimage,rightimage),1)
combine=self.encoder(combine)
#combine=self.encoder(leftimage)

    #print(combine.shape)
    combine=torch.flatten(combine,1)
    #print(combine.shape)
    combine=self.classifier(combine)
    return combine

what is your batch size? that’s prob the problem

HI Juan ,

Thanks for your reply. The batch size is 2 which means 4 images(3x224x224) to be load in batch.
I have try to the following config on the model :

  1. 3 channel -> 0.57MB (3x244x224)
  2. 4 channel -> 3.8GB (2x224x224 , 2x224x224)

Here is the code for the model :

class Vgg16PairInput(nn.Module):
    
    def weight_init(self,m):
        classname=m.__class__.__name__
        if classname.find('ConvTran')!=-1:
            m.weight.data.normal_(1,0.5)
    
    def __init__(self):
        super(Vgg16PairInput,self).__init__()
        #self.pretrained_model = models.vgg16(pretrained=True)
        #self.pretrained_model = models.vgg16(pretrained=True)
        self.encoder = nn.Sequential(
                nn.Conv2d(6,64,kernel_size=3,stride=1,padding=1),
                #nn.BatchNorm2d(64),
                nn.ReLU(),
                nn.Conv2d(64,64,kernel_size=3,stride=1,padding=1),
                #nn.BatchNorm2d(64),
                nn.ReLU(),
                nn.MaxPool2d(kernel_size=2, stride=2), #128 112
                nn.Conv2d(64,128,kernel_size=3,stride=1,padding=1),
                #nn.BatchNorm2d(128),
                nn.ReLU(),
                nn.Conv2d(128,128,kernel_size=3,stride=1,padding=1),
                #nn.BatchNorm2d(128),
                nn.ReLU(),
                nn.MaxPool2d(kernel_size=2, stride=2), #64 56
                nn.Conv2d(128,256,kernel_size=3,stride=1,padding=1),
                #nn.BatchNorm2d(256),
                nn.ReLU(),
                nn.Conv2d(256,256,kernel_size=3,stride=1,padding=1),
                #nn.BatchNorm2d(256),
                nn.ReLU(),
                nn.MaxPool2d(kernel_size=2, stride=2), #32 28
                nn.Conv2d(256,512,kernel_size=3,stride=1,padding=1),
                #nn.BatchNorm2d(512),
                nn.ReLU(),
                nn.Conv2d(512,512,kernel_size=3,stride=1,padding=1),
                #nn.BatchNorm2d(512),
                nn.ReLU(),
                nn.Conv2d(512,512,kernel_size=3,stride=1,padding=1),
                #nn.BatchNorm2d(512),
                nn.ReLU(),
                nn.MaxPool2d(kernel_size=2, stride=2), #16 14
                nn.Conv2d(512,512,kernel_size=3,stride=1,padding=1),
                #nn.BatchNorm2d(512),
                nn.ReLU(),
                nn.Conv2d(512,512,kernel_size=3,stride=1,padding=1),
                #nn.BatchNorm2d(512),
                nn.ReLU(),
                nn.Conv2d(512,512,kernel_size=3,stride=1,padding=1),
                #nn.BatchNorm2d(512),
                nn.ReLU(),
                nn.MaxPool2d(kernel_size=2, stride=2), #8  7                    
                                
        )
        
        
        self.classifier=nn.Sequential(
                nn.Linear(7*7*512,4096),
                nn.ReLU(),
                nn.Linear(4096,4096),
                nn.ReLU(),
                nn.Linear(4096,1)
        )
                        
        #del self.pretrained_model
        #self.encoder.apply(self.weight_init)
        
        
    def encode(self,images):
        code=self.encoder(images)
        return code
    
    
    
    def forward(self,leftimage,rightimage):
        combine=torch.cat((leftimage,rightimage),1)
        combine=self.encoder(combine)
        #combine=self.encoder(leftimage)
        
        #print(combine.shape)
        combine=torch.flatten(combine,1)
        #print(combine.shape)
        combine=self.classifier(combine)
        return combine

Thanks,
Dick

Hi All,

I have check out on the pytorch-summary source code and find the input size is calculate as follow :

 # assume 4 bytes/number (float on cuda).
total_input_size = abs(np.prod(sum(input_size, ()))
                           * batch_size * 4. / (1024 ** 2.))

The 8GB memory seems like it will multiply those 2 tensors shape together :
224 * 224 * 3 * 224 * 224 * 3
Then *4 = 90634715136 / (1024^2) = 86436
which match the display.
So I misunderstand that it would get the input size by following the forward function.

To consider in my case , the input size of the model should be (224 * 244 * 6 * 4)/(1024^2) = 1.148MB ?

Thanks,
Dick

Hi,
The basic input image (for a 224x224x3 tensor) would be
2242243=150528 elements (numbers)
602112 bytes = 0.574 Gb
So in the end for a Batch size 2 and having 2 images per input we get
0.57422 = 2.3 Gb

Therefore I think you prob have a bug in the dataset/loader
Soo I would suggest to inspect the shape of the tensor before calling sending the tensor to the gpu as trying to allocate 86 Gb is way too far wrt the size it should be asking.

Another options is that you passed a wrong input to the summary.

Hi Juan,

Sorry for my very late reply and Thanks for your help again.
The data size problem is solved and it is found that the model is not fit for torch summary to calculate the parameter size correctly, and after that my Jetson Nano is gone and I have changed to using Desktop to continuous on the journey, but actually, it is better than using jetson nano as it got more RAM to run instead of 4GB limitation. :rofl:

Furthermore, I have got a little breakthrough on the model as well, if you are interested in, here is the notebook link for your review
https://www.kaggle.com/tik65536/low-cost-diamond-feature-preliminary-report

I value your comments . :muscle:

Thanks,
Dick