Reshaping network input to use large images

I need to do some experiments with digital mammograms to see if I get the best results with large images (maximum image size will be [3, 1024, 1024). So I try to change the input size of DenseNet121 with this:

import torch.nn as nn
import torchvision
import torch.nn.functional as F
import src.experiment.settings as config

class DenseNet121_reshaped(nn.Module):
    def __init__(self):
        super(DenseNet121_reshaped, self).__init__()
        self.densenet121 = torchvision.models.densenet121(weights=torchvision.models.DenseNet121_Weights.IMAGENET1K_V1).features
        self.maxpool = nn.AdaptiveMaxPool2d(1)
        self.relu = nn.ReLU()
        self.mlp = nn.Linear(in_features=config.IMG_SIZE, out_features=config.NUM_CLASSES)
        self.sigmoid = nn.Sigmoid()
    
    def maxpool(self ,x):
        x = F.max_pool2d(x, kernel_size=x.size()[2:])
        return x

    def forward(self, x):
        x = self.densenet121(x)
        x = self.relu(x)
        x = self.maxpool(x)
        x = x.view(-1, config.IMG_SIZE)
        x = self.mlp(x)

        return x

settings.py with config parameters.

import torch.nn as nn
import torch.optim as optim

# Neural networks config
EPOCHS = 60
LEARNING_RATE = 0.0001  # 10**(-4)
LOSS_FUNCTION = nn.CrossEntropyLoss
OPTIMIZATION_FUNCTION = optim.Adam
NUM_CLASSES = 2
NUM_WORKERS = 4
BATCH_SIZE = 4
IMG_SIZE = 512

The code works with IMG_SIZE=1024 but crashes when I change IMG_SIZE to 512, giving this error msg.

Traceback (most recent call last):
  File "train.py", line 59, in <module>
    main(train_csv, val_csv, test_csv, img_path)
  File "train.py", line 45, in main
    train_dense(train_csv, val_csv, test_csv, img_path)
  File "train.py", line 36, in train_dense
    SimpleRunner.train(task, model, logger_description='DenseNet121 reshaped input size with AdaptiveMaxPool2d on Mammogramns', train_csv_file = train_csv, val_csv_file = val_csv, test_csv_file = test_csv)
  File "/home/project/src/experiment/runner.py", line 100, in train
    train_summary, val_summary = train(model=model,
  File "/home/project/src/experiment/training.py", line 279, in train
    train_summary = train_epoch(
  File "/home/project/src/experiment/training.py", line 221, in train_epoch
    predictions, loss = train_batch(
  File "/home/project/src/experiment/training.py", line 189, in train_batch
    loss = loss_fn(predictions, batch_labels)
  File "/home/project/medical_images/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/project/medical_images/lib/python3.8/site-packages/torch/nn/modules/loss.py", line 1164, in forward
    return F.cross_entropy(input, target, weight=self.weight,
  File "/home/project/medical_images/lib/python3.8/site-packages/torch/nn/functional.py", line 3014, in cross_entropy
    return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
ValueError: Expected input batch_size (8) to match target batch_size (4).

I use the torch info pkg to print the two reshapes denses.

from torchinfo import summary
    summary(model, input_size=(config.BATCH_SIZE, 3, image_size, image_size), verbose=1, device='cuda')

this is with img_size=1024

============
Layer (type:depth-idx) Output Shape Param #
============
DenseNet121_change_avg [4, 2] –
β”œβ”€Sequential: 1-1 [4, 1024, 32, 32] –
β”‚ └─Conv2d: 2-1 [4, 64, 512, 512] 9,408
β”‚ └─BatchNorm2d: 2-2 [4, 64, 512, 512] 128
β”‚ └─ReLU: 2-3 [4, 64, 512, 512] –
β”‚ └─MaxPool2d: 2-4 [4, 64, 256, 256] –
β”‚ └─_DenseBlock: 2-5 [4, 256, 256, 256] –
β”‚ β”‚ └─_DenseLayer: 3-1 [4, 32, 256, 256] 45,440
β”‚ β”‚ └─_DenseLayer: 3-2 [4, 32, 256, 256] 49,600
β”‚ β”‚ └─_DenseLayer: 3-3 [4, 32, 256, 256] 53,760
β”‚ β”‚ └─_DenseLayer: 3-4 [4, 32, 256, 256] 57,920
β”‚ β”‚ └─_DenseLayer: 3-5 [4, 32, 256, 256] 62,080
β”‚ β”‚ └─_DenseLayer: 3-6 [4, 32, 256, 256] 66,240
β”‚ └─_Transition: 2-6 [4, 128, 128, 128] –
β”‚ β”‚ └─BatchNorm2d: 3-7 [4, 256, 256, 256] 512
β”‚ β”‚ └─ReLU: 3-8 [4, 256, 256, 256] –
β”‚ β”‚ └─Conv2d: 3-9 [4, 128, 256, 256] 32,768
β”‚ β”‚ └─AvgPool2d: 3-10 [4, 128, 128, 128] –
β”‚ └─_DenseBlock: 2-7 [4, 512, 128, 128] –
β”‚ β”‚ └─_DenseLayer: 3-11 [4, 32, 128, 128] 53,760
β”‚ β”‚ └─_DenseLayer: 3-12 [4, 32, 128, 128] 57,920
β”‚ β”‚ └─_DenseLayer: 3-13 [4, 32, 128, 128] 62,080
β”‚ β”‚ └─_DenseLayer: 3-14 [4, 32, 128, 128] 66,240
β”‚ β”‚ └─_DenseLayer: 3-15 [4, 32, 128, 128] 70,400
β”‚ β”‚ └─_DenseLayer: 3-16 [4, 32, 128, 128] 74,560
β”‚ β”‚ └─_DenseLayer: 3-17 [4, 32, 128, 128] 78,720
β”‚ β”‚ └─_DenseLayer: 3-18 [4, 32, 128, 128] 82,880
β”‚ β”‚ └─_DenseLayer: 3-19 [4, 32, 128, 128] 87,040
β”‚ β”‚ └─_DenseLayer: 3-20 [4, 32, 128, 128] 91,200
β”‚ β”‚ └─_DenseLayer: 3-21 [4, 32, 128, 128] 95,360
β”‚ β”‚ └─_DenseLayer: 3-22 [4, 32, 128, 128] 99,520
β”‚ └─_Transition: 2-8 [4, 256, 64, 64] –
β”‚ β”‚ └─BatchNorm2d: 3-23 [4, 512, 128, 128] 1,024
β”‚ β”‚ └─ReLU: 3-24 [4, 512, 128, 128] –
β”‚ β”‚ └─Conv2d: 3-25 [4, 256, 128, 128] 131,072
β”‚ β”‚ └─AvgPool2d: 3-26 [4, 256, 64, 64] –
β”‚ └─_DenseBlock: 2-9 [4, 1024, 64, 64] –
β”‚ β”‚ └─_DenseLayer: 3-27 [4, 32, 64, 64] 70,400
β”‚ β”‚ └─_DenseLayer: 3-28 [4, 32, 64, 64] 74,560
β”‚ β”‚ └─_DenseLayer: 3-29 [4, 32, 64, 64] 78,720
β”‚ β”‚ └─_DenseLayer: 3-30 [4, 32, 64, 64] 82,880
β”‚ β”‚ └─_DenseLayer: 3-31 [4, 32, 64, 64] 87,040
β”‚ β”‚ └─_DenseLayer: 3-32 [4, 32, 64, 64] 91,200
β”‚ β”‚ └─_DenseLayer: 3-33 [4, 32, 64, 64] 95,360
β”‚ β”‚ └─_DenseLayer: 3-34 [4, 32, 64, 64] 99,520
β”‚ β”‚ └─_DenseLayer: 3-35 [4, 32, 64, 64] 103,680
β”‚ β”‚ └─_DenseLayer: 3-36 [4, 32, 64, 64] 107,840
β”‚ β”‚ └─_DenseLayer: 3-37 [4, 32, 64, 64] 112,000
β”‚ β”‚ └─_DenseLayer: 3-38 [4, 32, 64, 64] 116,160
β”‚ β”‚ └─_DenseLayer: 3-39 [4, 32, 64, 64] 120,320
β”‚ β”‚ └─_DenseLayer: 3-40 [4, 32, 64, 64] 124,480
β”‚ β”‚ └─_DenseLayer: 3-41 [4, 32, 64, 64] 128,640
β”‚ β”‚ └─_DenseLayer: 3-42 [4, 32, 64, 64] 132,800
β”‚ β”‚ └─_DenseLayer: 3-43 [4, 32, 64, 64] 136,960
β”‚ β”‚ └─_DenseLayer: 3-44 [4, 32, 64, 64] 141,120
β”‚ β”‚ └─_DenseLayer: 3-45 [4, 32, 64, 64] 145,280
β”‚ β”‚ └─_DenseLayer: 3-46 [4, 32, 64, 64] 149,440
β”‚ β”‚ └─_DenseLayer: 3-47 [4, 32, 64, 64] 153,600
β”‚ β”‚ └─_DenseLayer: 3-48 [4, 32, 64, 64] 157,760
β”‚ β”‚ └─_DenseLayer: 3-49 [4, 32, 64, 64] 161,920
β”‚ β”‚ └─_DenseLayer: 3-50 [4, 32, 64, 64] 166,080
β”‚ └─_Transition: 2-10 [4, 512, 32, 32] –
β”‚ β”‚ └─BatchNorm2d: 3-51 [4, 1024, 64, 64] 2,048
β”‚ β”‚ └─ReLU: 3-52 [4, 1024, 64, 64] –
β”‚ β”‚ └─Conv2d: 3-53 [4, 512, 64, 64] 524,288
β”‚ β”‚ └─AvgPool2d: 3-54 [4, 512, 32, 32] –
β”‚ └─_DenseBlock: 2-11 [4, 1024, 32, 32] –
β”‚ β”‚ └─_DenseLayer: 3-55 [4, 32, 32, 32] 103,680
β”‚ β”‚ └─_DenseLayer: 3-56 [4, 32, 32, 32] 107,840
β”‚ β”‚ └─_DenseLayer: 3-57 [4, 32, 32, 32] 112,000
β”‚ β”‚ └─_DenseLayer: 3-58 [4, 32, 32, 32] 116,160
β”‚ β”‚ └─_DenseLayer: 3-59 [4, 32, 32, 32] 120,320
β”‚ β”‚ └─_DenseLayer: 3-60 [4, 32, 32, 32] 124,480
β”‚ β”‚ └─_DenseLayer: 3-61 [4, 32, 32, 32] 128,640
β”‚ β”‚ └─_DenseLayer: 3-62 [4, 32, 32, 32] 132,800
β”‚ β”‚ └─_DenseLayer: 3-63 [4, 32, 32, 32] 136,960
β”‚ β”‚ └─_DenseLayer: 3-64 [4, 32, 32, 32] 141,120
β”‚ β”‚ └─_DenseLayer: 3-65 [4, 32, 32, 32] 145,280
β”‚ β”‚ └─_DenseLayer: 3-66 [4, 32, 32, 32] 149,440
β”‚ β”‚ └─_DenseLayer: 3-67 [4, 32, 32, 32] 153,600
β”‚ β”‚ └─_DenseLayer: 3-68 [4, 32, 32, 32] 157,760
β”‚ β”‚ └─_DenseLayer: 3-69 [4, 32, 32, 32] 161,920
β”‚ β”‚ └─_DenseLayer: 3-70 [4, 32, 32, 32] 166,080
β”‚ └─BatchNorm2d: 2-12 [4, 1024, 32, 32] 2,048
β”œβ”€AdaptiveMaxPool2d: 1-2 – –
β”œβ”€ReLU: 1-3 [4, 1024, 32, 32] –
β”œβ”€Linear: 1-4 [4, 2] 2,050
β”œβ”€Sigmoid: 1-5 – –
============
Total params: 6,955,906
Trainable params: 6,955,906
Non-trainable params: 0
Total mult-adds (G): 236.83
============
Input size (MB): 50.33
Forward/backward pass size (MB): 15091.11
Params size (MB): 27.82
Estimated Total Size (MB): 15169.26
============

and this was img_size=512

============
Layer (type:depth-idx) Output Shape Param #
============
DenseNet121_change_avg [8, 2] –
β”œβ”€Sequential: 1-1 [4, 1024, 16, 16] –
β”‚ └─Conv2d: 2-1 [4, 64, 256, 256] 9,408
β”‚ └─BatchNorm2d: 2-2 [4, 64, 256, 256] 128
β”‚ └─ReLU: 2-3 [4, 64, 256, 256] –
β”‚ └─MaxPool2d: 2-4 [4, 64, 128, 128] –
β”‚ └─_DenseBlock: 2-5 [4, 256, 128, 128] –
β”‚ β”‚ └─_DenseLayer: 3-1 [4, 32, 128, 128] 45,440
β”‚ β”‚ └─_DenseLayer: 3-2 [4, 32, 128, 128] 49,600
β”‚ β”‚ └─_DenseLayer: 3-3 [4, 32, 128, 128] 53,760
β”‚ β”‚ └─_DenseLayer: 3-4 [4, 32, 128, 128] 57,920
β”‚ β”‚ └─_DenseLayer: 3-5 [4, 32, 128, 128] 62,080
β”‚ β”‚ └─_DenseLayer: 3-6 [4, 32, 128, 128] 66,240
β”‚ └─_Transition: 2-6 [4, 128, 64, 64] –
β”‚ β”‚ └─BatchNorm2d: 3-7 [4, 256, 128, 128] 512
β”‚ β”‚ └─ReLU: 3-8 [4, 256, 128, 128] –
β”‚ β”‚ └─Conv2d: 3-9 [4, 128, 128, 128] 32,768
β”‚ β”‚ └─AvgPool2d: 3-10 [4, 128, 64, 64] –
β”‚ └─_DenseBlock: 2-7 [4, 512, 64, 64] –
β”‚ β”‚ └─_DenseLayer: 3-11 [4, 32, 64, 64] 53,760
β”‚ β”‚ └─_DenseLayer: 3-12 [4, 32, 64, 64] 57,920
β”‚ β”‚ └─_DenseLayer: 3-13 [4, 32, 64, 64] 62,080
β”‚ β”‚ └─_DenseLayer: 3-14 [4, 32, 64, 64] 66,240
β”‚ β”‚ └─_DenseLayer: 3-15 [4, 32, 64, 64] 70,400
β”‚ β”‚ └─_DenseLayer: 3-16 [4, 32, 64, 64] 74,560
β”‚ β”‚ └─_DenseLayer: 3-17 [4, 32, 64, 64] 78,720
β”‚ β”‚ └─_DenseLayer: 3-18 [4, 32, 64, 64] 82,880
β”‚ β”‚ └─_DenseLayer: 3-19 [4, 32, 64, 64] 87,040
β”‚ β”‚ └─_DenseLayer: 3-20 [4, 32, 64, 64] 91,200
β”‚ β”‚ └─_DenseLayer: 3-21 [4, 32, 64, 64] 95,360
β”‚ β”‚ └─_DenseLayer: 3-22 [4, 32, 64, 64] 99,520
β”‚ └─_Transition: 2-8 [4, 256, 32, 32] –
β”‚ β”‚ └─BatchNorm2d: 3-23 [4, 512, 64, 64] 1,024
β”‚ β”‚ └─ReLU: 3-24 [4, 512, 64, 64] –
β”‚ β”‚ └─Conv2d: 3-25 [4, 256, 64, 64] 131,072
β”‚ β”‚ └─AvgPool2d: 3-26 [4, 256, 32, 32] –
β”‚ └─_DenseBlock: 2-9 [4, 1024, 32, 32] –
β”‚ β”‚ └─_DenseLayer: 3-27 [4, 32, 32, 32] 70,400
β”‚ β”‚ └─_DenseLayer: 3-28 [4, 32, 32, 32] 74,560
β”‚ β”‚ └─_DenseLayer: 3-29 [4, 32, 32, 32] 78,720
β”‚ β”‚ └─_DenseLayer: 3-30 [4, 32, 32, 32] 82,880
β”‚ β”‚ └─_DenseLayer: 3-31 [4, 32, 32, 32] 87,040
β”‚ β”‚ └─_DenseLayer: 3-32 [4, 32, 32, 32] 91,200
β”‚ β”‚ └─_DenseLayer: 3-33 [4, 32, 32, 32] 95,360
β”‚ β”‚ └─_DenseLayer: 3-34 [4, 32, 32, 32] 99,520
β”‚ β”‚ └─_DenseLayer: 3-35 [4, 32, 32, 32] 103,680
β”‚ β”‚ └─_DenseLayer: 3-36 [4, 32, 32, 32] 107,840
β”‚ β”‚ └─_DenseLayer: 3-37 [4, 32, 32, 32] 112,000
β”‚ β”‚ └─_DenseLayer: 3-38 [4, 32, 32, 32] 116,160
β”‚ β”‚ └─_DenseLayer: 3-39 [4, 32, 32, 32] 120,320
β”‚ β”‚ └─_DenseLayer: 3-40 [4, 32, 32, 32] 124,480
β”‚ β”‚ └─_DenseLayer: 3-41 [4, 32, 32, 32] 128,640
β”‚ β”‚ └─_DenseLayer: 3-42 [4, 32, 32, 32] 132,800
β”‚ β”‚ └─_DenseLayer: 3-43 [4, 32, 32, 32] 136,960
β”‚ β”‚ └─_DenseLayer: 3-44 [4, 32, 32, 32] 141,120
β”‚ β”‚ └─_DenseLayer: 3-45 [4, 32, 32, 32] 145,280
β”‚ β”‚ └─_DenseLayer: 3-46 [4, 32, 32, 32] 149,440
β”‚ β”‚ └─_DenseLayer: 3-47 [4, 32, 32, 32] 153,600
β”‚ β”‚ └─_DenseLayer: 3-48 [4, 32, 32, 32] 157,760
β”‚ β”‚ └─_DenseLayer: 3-49 [4, 32, 32, 32] 161,920
β”‚ β”‚ └─_DenseLayer: 3-50 [4, 32, 32, 32] 166,080
β”‚ └─_Transition: 2-10 [4, 512, 16, 16] –
β”‚ β”‚ └─BatchNorm2d: 3-51 [4, 1024, 32, 32] 2,048
β”‚ β”‚ └─ReLU: 3-52 [4, 1024, 32, 32] –
β”‚ β”‚ └─Conv2d: 3-53 [4, 512, 32, 32] 524,288
β”‚ β”‚ └─AvgPool2d: 3-54 [4, 512, 16, 16] –
β”‚ └─_DenseBlock: 2-11 [4, 1024, 16, 16] –
β”‚ β”‚ └─_DenseLayer: 3-55 [4, 32, 16, 16] 103,680
β”‚ β”‚ └─_DenseLayer: 3-56 [4, 32, 16, 16] 107,840
β”‚ β”‚ └─_DenseLayer: 3-57 [4, 32, 16, 16] 112,000
β”‚ β”‚ └─_DenseLayer: 3-58 [4, 32, 16, 16] 116,160
β”‚ β”‚ └─_DenseLayer: 3-59 [4, 32, 16, 16] 120,320
β”‚ β”‚ └─_DenseLayer: 3-60 [4, 32, 16, 16] 124,480
β”‚ β”‚ └─_DenseLayer: 3-61 [4, 32, 16, 16] 128,640
β”‚ β”‚ └─_DenseLayer: 3-62 [4, 32, 16, 16] 132,800
β”‚ β”‚ └─_DenseLayer: 3-63 [4, 32, 16, 16] 136,960
β”‚ β”‚ └─_DenseLayer: 3-64 [4, 32, 16, 16] 141,120
β”‚ β”‚ └─_DenseLayer: 3-65 [4, 32, 16, 16] 145,280
β”‚ β”‚ └─_DenseLayer: 3-66 [4, 32, 16, 16] 149,440
β”‚ β”‚ └─_DenseLayer: 3-67 [4, 32, 16, 16] 153,600
β”‚ β”‚ └─_DenseLayer: 3-68 [4, 32, 16, 16] 157,760
β”‚ β”‚ └─_DenseLayer: 3-69 [4, 32, 16, 16] 161,920
β”‚ β”‚ └─_DenseLayer: 3-70 [4, 32, 16, 16] 166,080
β”‚ └─BatchNorm2d: 2-12 [4, 1024, 16, 16] 2,048
β”œβ”€AdaptiveMaxPool2d: 1-2 – –
β”œβ”€ReLU: 1-3 [4, 1024, 16, 16] –
β”œβ”€Linear: 1-4 [8, 2] 1,026
β”œβ”€Sigmoid: 1-5 – –
============
Total params: 6,954,882
Trainable params: 6,954,882
Non-trainable params: 0
Total mult-adds (G): 59.21
============
Input size (MB): 12.58
Forward/backward pass size (MB): 3772.78
Params size (MB): 27.82
Estimated Total Size (MB): 3813.18
============

I can’t see what’s wrong… can anyone help?

I guess you are using a wrong shape in this view operation:

x = x.view(-1, config.IMG_SIZE)

and are changing the batch size.
Replace it with:

x = x.view(x.size(0), -1)

to keep the batch size equal and to flatten all other dimensions.
This could yield a shape mismatch error in the self.mlp layer and you might then need to fix the in_features to the reported activation shape.

Hi @ptrblck, thank you very much for replying ^^

When I make the change that you suggest, I need to set the in_feature value to 1024. And that’s exactly my problem. I had understood that the value of in_features had to be equal to the size of the new network layer, that is, equal to the height or width of the image.

I need to change the Densenet input to receive bigger images than [224,224,3]. When I tested it with a large image size [1024,1024,3] it worked ( pure luck by the way… =/ ). But image size [512,512,3] gives the above error.

How do I change the DenseNet input layer to receive any image size without shrinking to [224,224,3]?

No, that shouldn’t be the case and the in_features should be set to the number of features of the incoming activation.
In your case the input activation is the flattened output of the self.maxpool(x) operation which seems to have 1024 features.

It’s already working with my suggested fix:

class DenseNet121_reshaped(nn.Module):
    def __init__(self):
        super(DenseNet121_reshaped, self).__init__()
        self.densenet121 = torchvision.models.densenet121().features
        self.maxpool = nn.AdaptiveMaxPool2d(1)
        self.relu = nn.ReLU()
        self.mlp = nn.Linear(in_features=1024, out_features=10)
    
    def maxpool(self ,x):
        x = F.max_pool2d(x, kernel_size=x.size()[2:])
        return x

    def forward(self, x):
        x = self.densenet121(x)
        x = self.relu(x)
        x = self.maxpool(x)
        x = x.view(x.size(0), -1)
        x = self.mlp(x)
        return x
    
model = DenseNet121_reshaped()

x = torch.randn(2, 3, 224, 224)
out = model(x)
print(out.shape)
# torch.Size([2, 10])

x = torch.randn(3, 3, 1024, 1024)
out = model(x)
print(out.shape)
# torch.Size([3, 10])

x = torch.randn(4, 3, 700, 800)
out = model(x)
print(out.shape)
# torch.Size([4, 10])
1 Like

Thanks a lot for your explication!! :smiling_face_with_three_hearts: :heart_eyes: :star_struck:

It’s working now!

Just some beginner’s doubts:

  • Can I use the pre-trained Densenet, like that?
    self.densenet121 = torchvision.models.densenet121(weights=torchvision.models.DenseNet121_Weights.IMAGENET1K_V1).features

  • The β€œout_features=10” is equal to the number of classes?
    self.mlp = nn.Linear(in_features=1024, out_features=10)

  • Why do you remove the self.sigmoid = nn.Sigmoid()? Is it wrong to use it?

Thanks again!

1 Like

Yes, your usage looks generally correct, but just make sure you are using the .features sequential block as intended. In particular you would be missing the F.relu and F.adaptive_avg_pool2d usage from here. In your custom model you are using a max pooling layer instead, which is also fine, just make sure you are aware of these differences.

Yes, I just used a value to make the code executable. In your case you would set it to the number of classes.

You didn’t use it in the forward method and assuming you are working on a multi-class classification it would also be wrong to use it.
In case you are working on a binary classification, also drop it and use nn.BCEWithLogitsLoss for better numerical stability and pass the raw logit output from the last linear layer to the loss function.