Hi.
I tried to add zero padding before particular convolution layers, but when I tried the below code, CUDA OOM occurs after ~40 epochs (without the pre hook, no OOM). I used forward_pre_hook
as follows:
import torch.nn as nn
import torch.nn.functional as F
from torch.nn.utils import spectral_norm
def conv_layer(inplanes, planes, kernel_size, stride, padding, bias, do_spectral_norm):
conv = nn.Conv2d(in_channels=inplanes, out_channels=planes, kernel_size=kernel_size, stride=stride,
padding=padding, bias=bias)
if conv.kernel_size == (3, 3) and conv.stride == (2, 2) and conv.padding == (1, 1):
conv.padding = (0, 0)
def forward_pre_hook(module, input):
print('forward_pre_hook has been fired.')
modified_input = F.pad(input[0], pad=(0, 1, 0, 1))
return modified_input
conv.register_forward_pre_hook(forward_pre_hook)
if do_spectral_norm:
return spectral_norm(conv)
else:
return conv
This conv_layer
function is called several times in the main module.
Question1.
After I turned on the forward_pre_hook
, increment of allocated GPU memory was observed (18116MB to 21408MB). I think this is a natural behavior as the extra backwards graph corresponding to the added padding node would be created. However, as the training proceeds, it increased to 24146MB and 24154MB - here is I do not understand. How can I stop the memory leak? (Note that, I set cudnn.deterministic as True and cudnn.benchmark as False, for exact observation of the memory.)
Question2.
Or, spectral_norm
can be the problem? As far as I know, the spectral_norm implementation uses forward_pre_hook
as well and duplication of those hooks can be the source of problem? (FYI, I don’t fully know about the details inside torch.nn.utils.spectral_norm
).
Any suggestions are welcome. Thank you.
Torch version: 1.7.1
Python version: 3.7.4