Some questions about require_grad = False

Vincent · April 13, 2021, 10:26am

Hello,everyone.

When I tried to freeze part of my model in the process, I set require_grad to False, but I found that memory increased by more than two times at this point. I wonder if PyTorch will save the weight at once and copy it to continue execution, and if so, is there a better way to save my memory since GPU cannot work now.

I tracked my memory usage and it worked until it was frozen.

At Backbone_idsub <module>: line 202                 Total Used Memory:8267.4 Mb

+ | 52 * Size:()                   | Memory: 0.0002 M | <class 'torch.Tensor'> | torch.float32
+ | 44 * Size:(1,)                 | Memory: 0.0001 M | <class 'torch.Tensor'> | torch.float32
- | 42 * Size:(1,)                 | Memory: 0.0001 M | <class 'torch.Tensor'> | torch.float32
- | 50 * Size:()                   | Memory: 0.0001 M | <class 'torch.Tensor'> | torch.float32

At Backbone_idsub <module>: line 202                 Total Used Memory:8267.4 Mb

+ | 54 * Size:()                   | Memory: 0.0002 M | <class 'torch.Tensor'> | torch.float32
+ | 46 * Size:(1,)                 | Memory: 0.0001 M | <class 'torch.Tensor'> | torch.float32
- | 52 * Size:()                   | Memory: 0.0002 M | <class 'torch.Tensor'> | torch.float32
- | 44 * Size:(1,)                 | Memory: 0.0001 M | <class 'torch.Tensor'> | torch.float32

At Backbone_idsub <module>: line 202                 Total Used Memory:8267.4 Mb

+ | 48 * Size:(1,)                 | Memory: 0.0001 M | <class 'torch.Tensor'> | torch.float32
+ | 56 * Size:()                   | Memory: 0.0002 M | <class 'torch.Tensor'> | torch.float32
- | 54 * Size:()                   | Memory: 0.0002 M | <class 'torch.Tensor'> | torch.float32
- | 46 * Size:(1,)                 | Memory: 0.0001 M | <class 'torch.Tensor'> | torch.float32

At Backbone_idsub <module>: line 202                 Total Used Memory:8269.6 Mb

+ | 50 * Size:(1,)                 | Memory: 0.0001 M | <class 'torch.Tensor'> | torch.float32
+ | 58 * Size:()                   | Memory: 0.0002 M | <class 'torch.Tensor'> | torch.float32
- | 48 * Size:(1,)                 | Memory: 0.0001 M | <class 'torch.Tensor'> | torch.float32
- | 56 * Size:()                   | Memory: 0.0002 M | <class 'torch.Tensor'> | torch.float32

At Backbone_idsub <module>: line 202                 Total Used Memory:8267.4 Mb

+ | 60 * Size:()                   | Memory: 0.0002 M | <class 'torch.Tensor'> | torch.float32
+ | 52 * Size:(1,)                 | Memory: 0.0002 M | <class 'torch.Tensor'> | torch.float32
- | 50 * Size:(1,)                 | Memory: 0.0001 M | <class 'torch.Tensor'> | torch.float32
- | 58 * Size:()                   | Memory: 0.0002 M | <class 'torch.Tensor'> | torch.float32

At Backbone_idsub <module>: line 202                 Total Used Memory:8267.4 Mb

+ | 62 * Size:()                   | Memory: 0.0002 M | <class 'torch.Tensor'> | torch.float32
+ | 54 * Size:(1,)                 | Memory: 0.0002 M | <class 'torch.Tensor'> | torch.float32
- | 60 * Size:()                   | Memory: 0.0002 M | <class 'torch.Tensor'> | torch.float32
- | 52 * Size:(1,)                 | Memory: 0.0002 M | <class 'torch.Tensor'> | torch.float32

At Backbone_idsub <module>: line 202                 Total Used Memory:8267.4 Mb

+ | 56 * Size:(1,)                 | Memory: 0.0002 M | <class 'torch.Tensor'> | torch.float32
+ | 64 * Size:()                   | Memory: 0.0002 M | <class 'torch.Tensor'> | torch.float32
- | 62 * Size:()                   | Memory: 0.0002 M | <class 'torch.Tensor'> | torch.float32
- | 54 * Size:(1,)                 | Memory: 0.0002 M | <class 'torch.Tensor'> | torch.float32

At Backbone_idsub <module>: line 202                 Total Used Memory:8267.4 Mb

+ | 58 * Size:(1,)                 | Memory: 0.0002 M | <class 'torch.Tensor'> | torch.float32
+ | 66 * Size:()                   | Memory: 0.0002 M | <class 'torch.Tensor'> | torch.float32
- | 56 * Size:(1,)                 | Memory: 0.0002 M | <class 'torch.Tensor'> | torch.float32
- | 64 * Size:()                   | Memory: 0.0002 M | <class 'torch.Tensor'> | torch.float32

At Backbone_idsub <module>: line 202                 Total Used Memory:8267.4 Mb

+ | 68 * Size:()                   | Memory: 0.0002 M | <class 'torch.Tensor'> | torch.float32
+ | 60 * Size:(1,)                 | Memory: 0.0002 M | <class 'torch.Tensor'> | torch.float32
- | 58 * Size:(1,)                 | Memory: 0.0002 M | <class 'torch.Tensor'> | torch.float32
- | 66 * Size:()                   | Memory: 0.0002 M | <class 'torch.Tensor'> | torch.float32

At Backbone_idsub <module>: line 202                 Total Used Memory:8267.4 Mb

+ | 70 * Size:()                   | Memory: 0.0002 M | <class 'torch.Tensor'> | torch.float32
+ | 62 * Size:(1,)                 | Memory: 0.0002 M | <class 'torch.Tensor'> | torch.float32
- | 68 * Size:()                   | Memory: 0.0002 M | <class 'torch.Tensor'> | torch.float32
- | 60 * Size:(1,)                 | Memory: 0.0002 M | <class 'torch.Tensor'> | torch.float32

At Backbone_idsub <module>: line 202                 Total Used Memory:20112.1Mb

Looking forward to your reply :).

sio277 · April 13, 2021, 8:15pm

This is not the answer to your question, but how did you get the memory info? Just the memory of parameters?

Vincent · April 15, 2021, 9:04am

hi, sio277.
I use the following script based on pynvml to track memory usage.

https://github.com/Oldpan/Pytorch-Memory-Utils/blob/master/gpu_mem_track.py

I now want to use different loss values during the process. At the beginning of the training I use the following function to freeze the model, but when I unfreeze it during the training, it will report an error out of memory.

    @staticmethod
    def set_grads(mod, state):
        for para in mod.parameters():
            para.requires_grad = state

By the way, I set a flag (True or False) to select different loss values, but it will get error info (out of memory), and when the flag is canceled it will be fine.

I wonder why that is.

Looking forward to your reply :).

sio277 · April 15, 2021, 5:09pm

Thanks. I don’t know the exact situation because I can’t see the whole code, but how about using detach() or torch.no_grad() to freeze part of your model? I think it is a safer approach.

Vincent · April 16, 2021, 8:00am

Thank you for your reply!

detach will create a new tensor for modules that do not require gradients, which seems even less applicable. Regarding torch.no_grad(), it seems that can only be applied to the serialized training process below. If I need to unfrozen the model during training, what should I do with torch.no_grad().

class xxnet(nn.Module):
    def __init__():
        ....
        self.layer1 = xx
        self.layer2 = xx
        self.fc = xx

    def forward(self.x):
        with torch.no_grad():
            x = self.layer1(x)
            x = self.layer2(x)
        x = self.fc(x)
        return x

Thanks for patiently answering my questions

sio277 · April 16, 2021, 8:39am

Then, consider using detach_(), which is an inplace operation.

If you want torch.no_grad() for defrosting the model during training, how about something like this?:

if freeze is True:
    with torch.no_grad():
        xxnet()(input_tensor)
else:
    xxnet()(input_tensor)

el_youssfi_azeddine · April 16, 2021, 8:59am

hello, If you use torch.no_grad(): it uses same size of the memory?
this line freezes all gradients.

sio277 · April 16, 2021, 9:03am

It reduces the memory consumption, because no forward tensors needed to calculate the gradient are saved.

Vincent · April 19, 2021, 6:10am

Thanks a lot, I used this structure and solved my problem !