On the usage of torch.no_grad()

It is probably a pretty stupid question, but I am not able to get to the solution.

I am using torch.no_grad() during inference to avoid keeping track of the history. The problem is that the memory increases like 5GB during this stage. I am only able to solve it using .data. This was the classical way I solve it, however I will like to move to the new approaches.

   #memory increases a lot
    with torch.no_grad():
                for idx,(x,t) in enumerate(data_valid):                                                   
                        x,t=x.cuda(),t.cuda()
                        out=net.forward_test(x)
                        MC_valid+=net.classification_error(out,t)

   #memory is oK
    for idx,(x,t) in enumerate(data_valid):                                                   
               x,t=x.cuda(),t.cuda()
               out=net.forward_test(x)
               MC_valid+=net.classification_error(out,t).data

Hi,

torch.no_grad() takes no parameter and will raise an error if you give it one. So maybe you have an old version of pytorch?

sorry, It was a mistake, I do not pass parameter. I am using 1.0.0 with python3.7. Should memory increase during inference if we use torch.no_grad(). I hope that only if the batch one uses during inference is bigger.

If you have same batch_size it should be similar (slightly lower) than without it.
Given the place of the .data in the code above, could you check if MC_valid has requires_grad=True in both cases? is it possible that something else set the requires_grad state?

I am going to inspect the Neural Model (as it is not done by me). It seems that maybe in some part of the model the mode is changed to train and not to eval.

Even if we set torch.no_grad, if the model is runned in training mode, (model.train()) are the gradients stored?

The thing is that torch.no_grad() is a convenient way to set the gradients and restore their state. But if something calls directly set_grad_enabled(True), it will nullify the effect of your torch no_grad block :confused:

That being said, this is quite unlikely. Did you checked if the output do require gradients? If they don’t this is not the problem.

I have been checking the previous implementation and the problem was that some layers had their training mode set to True. So it might be what you expose, that the set_grad_enabled(True) was set internally. That was one of the things I did not know, whether torch.no_grad() had maximum priority or not

They all have the same priority ! So if you do

with torch.no_grad():
    torch.set_grad_enabled(True)
    # Some code

Will be completely the same as without the no_grad().

1 Like