Some questions about require_grad = False

Hello,everyone.

When I tried to freeze part of my model in the process, I set require_grad to False, but I found that memory increased by more than two times at this point. I wonder if PyTorch will save the weight at once and copy it to continue execution, and if so, is there a better way to save my memory since GPU cannot work now.

I tracked my memory usage and it worked until it was frozen.

At Backbone_idsub <module>: line 202                 Total Used Memory:8267.4 Mb

+ | 52 * Size:()                   | Memory: 0.0002 M | <class 'torch.Tensor'> | torch.float32
+ | 44 * Size:(1,)                 | Memory: 0.0001 M | <class 'torch.Tensor'> | torch.float32
- | 42 * Size:(1,)                 | Memory: 0.0001 M | <class 'torch.Tensor'> | torch.float32
- | 50 * Size:()                   | Memory: 0.0001 M | <class 'torch.Tensor'> | torch.float32

At Backbone_idsub <module>: line 202                 Total Used Memory:8267.4 Mb

+ | 54 * Size:()                   | Memory: 0.0002 M | <class 'torch.Tensor'> | torch.float32
+ | 46 * Size:(1,)                 | Memory: 0.0001 M | <class 'torch.Tensor'> | torch.float32
- | 52 * Size:()                   | Memory: 0.0002 M | <class 'torch.Tensor'> | torch.float32
- | 44 * Size:(1,)                 | Memory: 0.0001 M | <class 'torch.Tensor'> | torch.float32

At Backbone_idsub <module>: line 202                 Total Used Memory:8267.4 Mb

+ | 48 * Size:(1,)                 | Memory: 0.0001 M | <class 'torch.Tensor'> | torch.float32
+ | 56 * Size:()                   | Memory: 0.0002 M | <class 'torch.Tensor'> | torch.float32
- | 54 * Size:()                   | Memory: 0.0002 M | <class 'torch.Tensor'> | torch.float32
- | 46 * Size:(1,)                 | Memory: 0.0001 M | <class 'torch.Tensor'> | torch.float32

At Backbone_idsub <module>: line 202                 Total Used Memory:8269.6 Mb

+ | 50 * Size:(1,)                 | Memory: 0.0001 M | <class 'torch.Tensor'> | torch.float32
+ | 58 * Size:()                   | Memory: 0.0002 M | <class 'torch.Tensor'> | torch.float32
- | 48 * Size:(1,)                 | Memory: 0.0001 M | <class 'torch.Tensor'> | torch.float32
- | 56 * Size:()                   | Memory: 0.0002 M | <class 'torch.Tensor'> | torch.float32

At Backbone_idsub <module>: line 202                 Total Used Memory:8267.4 Mb

+ | 60 * Size:()                   | Memory: 0.0002 M | <class 'torch.Tensor'> | torch.float32
+ | 52 * Size:(1,)                 | Memory: 0.0002 M | <class 'torch.Tensor'> | torch.float32
- | 50 * Size:(1,)                 | Memory: 0.0001 M | <class 'torch.Tensor'> | torch.float32
- | 58 * Size:()                   | Memory: 0.0002 M | <class 'torch.Tensor'> | torch.float32

At Backbone_idsub <module>: line 202                 Total Used Memory:8267.4 Mb

+ | 62 * Size:()                   | Memory: 0.0002 M | <class 'torch.Tensor'> | torch.float32
+ | 54 * Size:(1,)                 | Memory: 0.0002 M | <class 'torch.Tensor'> | torch.float32
- | 60 * Size:()                   | Memory: 0.0002 M | <class 'torch.Tensor'> | torch.float32
- | 52 * Size:(1,)                 | Memory: 0.0002 M | <class 'torch.Tensor'> | torch.float32

At Backbone_idsub <module>: line 202                 Total Used Memory:8267.4 Mb

+ | 56 * Size:(1,)                 | Memory: 0.0002 M | <class 'torch.Tensor'> | torch.float32
+ | 64 * Size:()                   | Memory: 0.0002 M | <class 'torch.Tensor'> | torch.float32
- | 62 * Size:()                   | Memory: 0.0002 M | <class 'torch.Tensor'> | torch.float32
- | 54 * Size:(1,)                 | Memory: 0.0002 M | <class 'torch.Tensor'> | torch.float32

At Backbone_idsub <module>: line 202                 Total Used Memory:8267.4 Mb

+ | 58 * Size:(1,)                 | Memory: 0.0002 M | <class 'torch.Tensor'> | torch.float32
+ | 66 * Size:()                   | Memory: 0.0002 M | <class 'torch.Tensor'> | torch.float32
- | 56 * Size:(1,)                 | Memory: 0.0002 M | <class 'torch.Tensor'> | torch.float32
- | 64 * Size:()                   | Memory: 0.0002 M | <class 'torch.Tensor'> | torch.float32

At Backbone_idsub <module>: line 202                 Total Used Memory:8267.4 Mb

+ | 68 * Size:()                   | Memory: 0.0002 M | <class 'torch.Tensor'> | torch.float32
+ | 60 * Size:(1,)                 | Memory: 0.0002 M | <class 'torch.Tensor'> | torch.float32
- | 58 * Size:(1,)                 | Memory: 0.0002 M | <class 'torch.Tensor'> | torch.float32
- | 66 * Size:()                   | Memory: 0.0002 M | <class 'torch.Tensor'> | torch.float32

At Backbone_idsub <module>: line 202                 Total Used Memory:8267.4 Mb

+ | 70 * Size:()                   | Memory: 0.0002 M | <class 'torch.Tensor'> | torch.float32
+ | 62 * Size:(1,)                 | Memory: 0.0002 M | <class 'torch.Tensor'> | torch.float32
- | 68 * Size:()                   | Memory: 0.0002 M | <class 'torch.Tensor'> | torch.float32
- | 60 * Size:(1,)                 | Memory: 0.0002 M | <class 'torch.Tensor'> | torch.float32

At Backbone_idsub <module>: line 202                 Total Used Memory:20112.1Mb

Looking forward to your reply :).

This is not the answer to your question, but how did you get the memory info? Just the memory of parameters?

hi, sio277.
I use the following script based on pynvml to track memory usage.

Pytorch-Memory-Utils/gpu_mem_track.py at master · Oldpan/Pytorch-Memory-Utils · GitHub

I now want to use different loss values during the process. At the beginning of the training I use the following function to freeze the model, but when I unfreeze it during the training, it will report an error out of memory.

    @staticmethod
    def set_grads(mod, state):
        for para in mod.parameters():
            para.requires_grad = state

By the way, I set a flag (True or False) to select different loss values, but it will get error info (out of memory), and when the flag is canceled it will be fine.

I wonder why that is.

Looking forward to your reply :).

Thanks. I don’t know the exact situation because I can’t see the whole code, but how about using detach() or torch.no_grad() to freeze part of your model? I think it is a safer approach.

Thank you for your reply!

detach will create a new tensor for modules that do not require gradients, which seems even less applicable. Regarding torch.no_grad(), it seems that can only be applied to the serialized training process below. If I need to unfrozen the model during training, what should I do with torch.no_grad().

class xxnet(nn.Module):
    def __init__():
        ....
        self.layer1 = xx
        self.layer2 = xx
        self.fc = xx

    def forward(self.x):
        with torch.no_grad():
            x = self.layer1(x)
            x = self.layer2(x)
        x = self.fc(x)
        return x

Thanks for patiently answering my questions :slight_smile:

Then, consider using detach_(), which is an inplace operation.

If you want torch.no_grad() for defrosting the model during training, how about something like this?:

if freeze is True:
    with torch.no_grad():
        xxnet()(input_tensor)
else:
    xxnet()(input_tensor)
1 Like

hello, If you use torch.no_grad(): it uses same size of the memory?
this line freezes all gradients.

It reduces the memory consumption, because no forward tensors needed to calculate the gradient are saved.

1 Like

Thanks a lot, I used this structure and solved my problem ! :smiley: