Using torch.no_grad inside forward function

    def forward(self, x):
        x = self.model(x)
        with torch.no_grad():
            x = x.view(-1,150,1024,size,size)
        x = self.c1(x)
        with torch.no_grad():
            x = x[0][0][:,-1,:,:,:]
        x = x.view(-1,512*size*size)
        x = self.dp(self.relu(self.fc1(x)))
        x = self.dp(self.relu(self.fc2(x)))
        x = self.dp(self.fc3(x))
        return x

Is this the correct way to reshape the array inside forward function or is there any better way.
Thanks in advance.

I must admit I have a hard time imagining when it would be a good idea to put just these things into torch.no_grad?

Best regards

Thomas

Sorry I am new to programming and was not sure about this. So can I use it like this?
Regards
Sanchit

Yeah, well. I would consider it likely that it will not achieve what you intend, but without knowing anything about what you actually want to do here, it is hard to say.

Best regards

Thomas

I want to know the best way to reshape tensors inside forward function as they take up a lot of memory.

with torch.no_grad():
            x = x.view(-1,150,1024,size,size)

or
x = x.view(-1,150,1024,size,size).detach()
I am not sure which one is better or they both are the same, but the time it took to process one batch was 1.7 sec in the first case and 2.2 sec in the second case. Also, how will backpropagation take place in both cases. Sorry for not making it clear.

If you want backpropagation, you need to stay clear of both no_grad and detach.

Benchmarking tends to be a bit delicate due to the asynchronous nature of CUDA, so you would need to take care of synchronization before measurement.

Generally, views share the memory with the array they are viewing, so you would not have to worry about it.

Best regards

Thomas

Hi @sanchit2843,

I believe what Thomas meant in one of the reply, is:

You should separate forward and backward phases.

Inside forward there is no gradient calculation. Forward is one side of the PyTorch medal and backward is another. Backward phase is where the gradients are calculated.

And usually this looks like a backward on loss. Say loss.backward().
You can always check what is going on, if you ask for a gradient on a tensor.

If you have a tensor t, you can ask for a t.grad. Tensor and it’s gradient should have the same dimension, in case gradient is not None.

Here is an example for you:
You create a tensor x and you ask for three more things in general:

x = torch.ones(2, 2, requires_grad=True)
print(x)
print(x.grad)
print(x.grad_fn)
print(x.is_leaf)

And the output of this will be:


tensor([[1., 1.],
        [1., 1.]], requires_grad=True)
None
None
True

Now, you create y:

y = x + 2

print(y)
print(y.grad)
print(y.grad_fn)
print(y.is_leaf)

Out:

tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward0>)
None
<AddBackward0 object at 0x0000028C403BC198>
False

So what you get is there is no gradient in y. Can you tell me why?

But let’s go further.


loss = torch.mean(y)
loss.backward()

print(y)
print(y.grad)
print(y.grad_fn)
print(y.is_leaf)

Out:

tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward0>)
None
<AddBackward0 object at 0x0000028C411582E8>
False

Still, no gradient in y.
But look there is no gradient in loss also:


print(loss)
print(loss.grad)
print(loss.grad_fn)
print(loss.is_leaf)
tensor(3., grad_fn=<MeanBackward0>)
None
<MeanBackward0 object at 0x0000028C403BC6D8>
False

So where is the gradient?


print(x)
print(x.grad)
print(x.grad_fn)

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)
tensor([[0.2500, 0.2500],
        [0.2500, 0.2500]])
None

It is calculated for x.

Now you will probable go to the definition of backward() and find out it calculates the gradients for the inputs.

Computes the sum of gradients of given tensors w.r.t. graph leaves.

So, by definition, backward computes the gradients for the leaves.

One example, where you can set manual gradients:

x = torch.ones(2, 2, requires_grad=True)
y = x * 2

y.backward(gradient=torch.tensor([[0.5,1.],[1.5,2.]]))

print(x)
print(x.grad)
print(x.grad_fn)
print(x.is_leaf)

Out:

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)
tensor([[1., 2.],
        [3., 4.]])
None
True

This speaks clearly that backward computation is little more complicated that you may expect.
You can set the manual gradients and this cannot be done unless a graph behind.

Here is what I mean unless a graph behind. If you task yourself this:


a=1
b=1
c=1
d=1
def f(a,b,c,d):
  return 2*(a + b + c + d)

e =0.01 # small number

'''gradients da'''

da = (f(a+e,b,c,d)-f(a,b,c,d))/e
print(da)

This will output:

1.9999999999999574

But to calculate the gradient for a we haven’t created anything like a graph. We just used the calculus.

So, I probable there is no sense to use torch.no_grad inside forward.

2 Likes

Thanks a lot. It helped