Hi! the @ operator, or matrix multiply, is stateless and accepts 2 input tensors. During the backprop, my understanding is that it’ll calculate two gradients w.r.t. the 2 input tensors, which will each update any variables via the chain rule along the paths that produce them, respectively.

In my experiment, I want to zero out the gradient for only one input tensor and keep the other as-is. My guess is that I need to use autograd, but I’m fairly new to this and the toy example (PyTorch: Defining New autograd Functions — PyTorch Tutorials 1.7.0 documentation) doesn’t seem suffice. Please help!