during training, I want to create a tensor to save some intermediate variables, like [16, 512], 16 means the length and 512 means hidden size. When I want to get variable from this tensor, like the 1st hidden state, I will create an one-hot mask like [1, 0, 0, …] to do a matrix multiply with this tensor to get the first hidden state saved in the tensor. While at this moment, will the backward can back the grad to variables which calculate the first hidden state I save in the tensor?

Yes, Autograd will be able to calculate the gradients of the operations resulting in the first hidden state.

As long as you don’t detach the computation graph, e.g. by using `tensor.data`

or `tensor.detach()`

, you should be fine.

PS: Alternatively, slicing might be simpler, but I’m not sure if that’s suitable for your use case.

Thank you for your help. Actually, I am not sure I understand the slicing operation in your answer. Could you give an example about it ?

I thought that maybe instead of multiplying your tensor with a one-hot mask, you could simply slice the desired location, e.g.:

```
#output = tensor * mask
output = tensor[0, :]
```

But as I’ve mentioned, I’m not sure if that would be suitable for your use case.