Why no inplace in evaluation mode?

Hi,

I use the VGG16-Network changed the last FC layers to convolutional ones and added some FC-layers after that.
I extract channels for 1000 Pixels in training from the image from different layers and combine them to hypercolumns. They are send through the classifier.

With one image of 224*224 Pixels and hypercolumns of 5568 * 32 (float) as size each this makes 8,9 GB. To big for my GPU of 8GB.

So I just send half of the image through the classifier, save the result, and then send the second half through the classifier and combine the results.

Unfortunatelly although I defined to do inplace-operations in the classifier in the evaluation phase the operations are never done inplace.

Can I somehow free the memory after the first half of the pictures was send through the classifier?

Here my Net and classifier:
(0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace)
(2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): ReLU(inplace)
(4): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), dilation=(1, 1), ceil_mode=False)
(5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(6): ReLU(inplace)
(7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(8): ReLU(inplace)
(9): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), dilation=(1, 1), ceil_mode=False)
(10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(11): ReLU(inplace)
(12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(13): ReLU(inplace)
(14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(15): ReLU(inplace)
(16): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), dilation=(1, 1), ceil_mode=False)
(17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(18): ReLU(inplace)
(19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(20): ReLU(inplace)
(21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(22): ReLU(inplace)
(23): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), dilation=(1, 1), ceil_mode=False)
(24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(25): ReLU(inplace)
(26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(27): ReLU(inplace)
(28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(29): ReLU(inplace)
(30): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), dilation=(1, 1), ceil_mode=False)

self.classifier = nn.Sequential(
nn.Linear(5568, 4096),
nn.ReLU(inplace=True),
nn.Dropout(inplace=True),
nn.Linear(4096, 4096),
nn.ReLU(inplace=True),
nn.Dropout(inplace=True),
nn.Linear(4096, 3)
)

If you want to backpropagate through the whole thing then pytorch needs to remember a lot of details for the backward pass. I assume that pytorch is pretty good at saving only what it needs to, so you probably wont be able to optimise that much.

On the other hand, if you are using pre-trained weights for most of the vgg layers then you should be able to tell pytorch that those layers don’t need backpropagating.

Something like this ought to work.

# tell pytorch not to optimise those layers
for param in params_of_layers_that_dont_need_training:
    param.requires_grad = False

# do forward like this
with torch.no_grad():
    layers_that_dont_need_training()
layers_that_do_need_training()