'forward_pre_hook' on module causes CUDA OOM errors

Hi.
I tried to add zero padding before particular convolution layers, but when I tried the below code, CUDA OOM occurs after ~40 epochs (without the pre hook, no OOM). I used forward_pre_hook as follows:

import torch.nn as nn
import torch.nn.functional as F
from torch.nn.utils import spectral_norm


def conv_layer(inplanes, planes, kernel_size, stride, padding, bias, do_spectral_norm):
    conv = nn.Conv2d(in_channels=inplanes, out_channels=planes, kernel_size=kernel_size, stride=stride,
                     padding=padding, bias=bias)

    if conv.kernel_size == (3, 3) and conv.stride == (2, 2) and conv.padding == (1, 1):
        conv.padding = (0, 0)

        def forward_pre_hook(module, input):
            print('forward_pre_hook has been fired.')
            modified_input = F.pad(input[0], pad=(0, 1, 0, 1))
            return modified_input

        conv.register_forward_pre_hook(forward_pre_hook)

    if do_spectral_norm:
        return spectral_norm(conv)
    else:
        return conv

This conv_layer function is called several times in the main module.

Question1.
After I turned on the forward_pre_hook, increment of allocated GPU memory was observed (18116MB to 21408MB). I think this is a natural behavior as the extra backwards graph corresponding to the added padding node would be created. However, as the training proceeds, it increased to 24146MB and 24154MB - here is I do not understand. How can I stop the memory leak? (Note that, I set cudnn.deterministic as True and cudnn.benchmark as False, for exact observation of the memory.)

Question2.
Or, spectral_norm can be the problem? As far as I know, the spectral_norm implementation uses forward_pre_hook as well and duplication of those hooks can be the source of problem? (FYI, I don’t fully know about the details inside torch.nn.utils.spectral_norm).

Any suggestions are welcome. Thank you.

Torch version: 1.7.1
Python version: 3.7.4