Torch.no_grad not functioning?

Chong_Toby · September 7, 2018, 9:13am

when I run the code below,

    @torch.no_grad()
    def draw_landmarks(self, imgs, landmarks, colour=[255, 0, 0]):
        """draw landmarks on images inplace
        Arguments:
            imgs {[npndarray]} -- [image with shape BxCxHxW]
            landmarks {[Tensor]} -- [integers with shape BxNx2]
        Keyword Arguments:
            colour {list} -- [description] (default: {[255, 0, 0]})
        """
        with torch.no_grad():
            assert imgs.shape[0] == landmarks.shape[0]
            for i in range(imgs.shape[0]):
                img_landmarks = landmarks[i]
                lm_x, lm_y = img_landmarks[:,0], img_landmarks[:,1]
                imgs[i, :, lm_y, lm_x] = colour
            return imgs

landmarks variable contains gradient information, which isn’t required here (as this is for visualization only). However, pytorch throws “cannot call numpy on variable requires gradient” if I don’t call detach for the input variable.

Is this expected behaviour, if so, isn’t this counter-intuitive?

Thank you!

SimonW · September 7, 2018, 6:25pm

Sorry about it. This is a bug. See https://github.com/pytorch/pytorch/issues/11390.

Although, please note that even if the bug is fixed, landmarks will still have requires_grad being true, because that it is created before the block.

Chong_Toby · September 8, 2018, 8:08am

Thanks for the clarification.
so if I understand correctly, torch.no_grad is a flag only for variable creation, not operations.
If that is the case, would there be a future function/decorator where all operations will not compute gradient? That aligns much better with my own (mis)understanding and seems easier to use.

tom · September 8, 2018, 9:47am

No, torch.no_grad() does disable tracking of operations. This included inplace operations (which are tracked when using .detach()) and the differentiation.
The only quirk is that requires_grad will still be set for the result of the computation - something that does not strike in some common cases (for example not when evaluating networks withing a with torch.no_grad(): block). As a result of the bug, a graph will be built for stuff following the no_grad scope when it is not required, but no gradient will flow back to the inputs, as requested by no_grad.
You could achieve the same as the post-bugfix behaviour now by manually using .detach() for the inputs or setting .requires_grad_(False) for the outputs of the computation.

Best regards

Thomas