Hi,
I am trying to replicate the result of C3D model. But I found that it occupies large GPU memory than estimated.
Here is the model:
### self.features_frame
self.features_frame = [
### part 1
nn.Conv3d(3, 64, kernel_size=(3, 3, 3), padding=(1, 1, 1)),
norm_layer(64),
nn.MaxPool3d(kernel_size=(1, 2, 2), stride=(1, 2, 2)),
nn.ReLU(True),
### part 2
nn.Conv3d(64, 128, kernel_size=(3, 3, 3), padding=(1, 1, 1)),
norm_layer(128),
nn.MaxPool3d(kernel_size=(2, 2, 2), stride=(2, 2, 2)),
nn.ReLU(True),
### part 3
nn.Conv3d(128, 256, kernel_size=(3, 3, 3), padding=(1, 1, 1)),
norm_layer(256), nn.ReLU(True),
nn.Conv3d(256, 256, kernel_size=(3, 3, 3), padding=(1, 1, 1)),
norm_layer(256),
nn.MaxPool3d(kernel_size=(2, 2, 2), stride=(2, 2, 2)),
nn.ReLU(True),
### part 4
nn.Conv3d(256, 256, kernel_size=(3, 3, 3), padding=(1, 1, 1)),
norm_layer(256), nn.ReLU(True),
nn.Conv3d(256, 256, kernel_size=(3, 3, 3), padding=(1, 1, 1)),
norm_layer(256),
nn.MaxPool3d(kernel_size=(2, 2, 2), stride=(2, 2, 2)),
nn.ReLU(True),
### part 5
nn.Conv3d(256, 256, kernel_size=(3, 3, 3), padding=(1, 1, 1)),
norm_layer(256), nn.ReLU(True),
nn.Conv3d(256, 512, kernel_size=(3, 3, 3), padding=(1, 1, 1)),
norm_layer(512),
nn.MaxPool3d(kernel_size=(2, 7, 7), stride=(2, 2, 2)),
nn.ReLU(True),
]
self.features_frame = nn.Sequential(*self.features_frame)
### self.classifier
self.classifier = [
nn.Linear(512, 128),
norm_layer(128), nn.ReLU(True),
nn.Linear(128, 10)
]
self.classifier = nn.Sequential(*self.classifier)
I am using Pytorch-0.2. Batch size is 1. The input size is (1, 3, 16, 112, 112). It occupies 1035MB gpu memory.
However, if I just modify the number of channels in the conv3d layer in the part 5, from 256 to 512.
### part 5
nn.Conv3d(256, 512, kernel_size=(3, 3, 3), padding=(1, 1, 1)),
norm_layer(512), nn.ReLU(True),
nn.Conv3d(512, 512, kernel_size=(3, 3, 3), padding=(1, 1, 1)),
norm_layer(512),
nn.MaxPool3d(kernel_size=(2, 7, 7), stride=(2, 2, 2)),
nn.ReLU(True),
It occupies 10013MB GPU memory, which is ten times larger then 1035MB.
I have read the previous questions about GPU memory, but still have no idea why it happens in this case. From my calculation, I think the modified network would occupies at most 4 or 5 times larger than the previous one, but not 10 times.
Appreciate if someone helps me. Thanks.