Loss.backward() called after torch.nograd()

Hi, i’m working my way through the Dive Into Deep Learning book and found the following code (pasted below or linked here). I’m confused on how this function still trains the linear regression model since it calls for loss.backward() after loss.no_grad()?
Thank you so much for your help!

@d2l.add_to_class(d2l.Trainer)  #@save
def fit_epoch(self):
    self.model.train()
    for batch in self.train_dataloader:
        loss = self.model.training_step(self.prepare_batch(batch))
        self.optim.zero_grad()
        with torch.no_grad():
            loss.backward()
            if self.gradient_clip_val > 0:  # To be discussed later
                self.clip_gradients(self.gradient_clip_val, self.model)
            self.optim.step()
        self.train_batch_idx += 1
    if self.val_dataloader is None:
        return
    self.model.eval()
    for batch in self.val_dataloader:
        with torch.no_grad():
            self.model.validation_step(self.prepare_batch(batch))
        self.val_batch_idx += 1

The code looks indeed a bit confusing:

        with torch.no_grad():
            loss.backward()
            if self.gradient_clip_val > 0:  # To be discussed later
                self.clip_gradients(self.gradient_clip_val, self.model)
            self.optim.step()

You don’t need to call loss.backward() and self.optim.step() in a no_grad context.
I don’t know if self.clip_gradients is performing inplace manipulation of parameters (I would not think so) and guess the no_grad context can be removed.

Hi, thank you so much for your help. Could you help me understand how the original function can still train a model with torch.no_grad being called? Thank you!!

The forward pass is outside the no_grad context, so the computation graph will be created and all intermediate activations, needed for the gradient computation, will be saved as well.

thank you so so much! This is very helpful!