Having trouble with autograd and slicing

Hi all,

I’m quite new to torch and am struggling to understand why my code here is giving an error. I’m working on a seq2seq type model, here is the relevant code

    def forward(self, x, y, is_eval=False, weighted=False):
        batch_size = x.size(0)
        n = x.size(1)
        y_pred = torch.zeros_like(y).cuda()
        encoder_outputs, _ = self.encoder(self.embed(x))
        decoder_outputs = torch.zeros(batch_size, n, self.decoder.output_size).cuda()
        dec_h = None

        for i in range(n):
            # Update the decoder
            embedded = self.embed(y_pred[:, i-1:i]) if i > 0 else torch.ones_like(y_pred[:, 0:1])
            decoder_outputs[:, i:i+1, :], dec_h = self.decoder(embedded, dec_h)
            prev_edge = torch.ones((batch_size, 1, 1)).cuda()  # This tracks s^(t)_i,j-1

            edge_h = None
            for j in range(i+1):
                theta, edge_h = self.edge_level(prev_edge, edge_h)
                edge_mlp_input = torch.cat((x[:, i:i+1, j:j+1],
                                            encoder_outputs[:, i:i+1, :],
                                            decoder_outputs[:, i:i+1, :]), dim=2)

                edge_prob = self.activation(self.edge_mlp(edge_mlp_input))
                if is_eval: # sample the sigmoid
                    y_pred[:, i:i+1, j:j+1] = sample_vector(edge_prob) if not weighted else edge_prob
                    prev_edge = y_pred[:, i:i+1, j:j+1]
                    y_pred[:, i:i+1, j:j+1] = edge_prob
                    prev_edge = edge_prob
        return y_pred

The Encoder and Decoder are just standard GRU RNNs from torch.

I am training with a very standard training loop you would find on any of the torch examples (hence the error is not coming from the is_eval section of the fwd pass), and I get the following error:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [180, 1, 14]], which is output 0 of SliceBackward, is at version 106; expected version 92 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

The Tensor it is referring to based on the sizes is the y_pred[:, i-1] tensor, but I can’t understand why what I am doing is causing an issue as it doesn’t appear to me that I am modifying it, but maybe there’s something I haven’t grasped here?

Many thanks,

Just as an update: I have managed to fix this issue with the following line

embedded = self.embed(y_pred[:, i-1:i].clone()) if i > 0 else torch.ones_like(y_pred[:, 0:1])

by looking at similar issues online. But I was wondering if someone could explain why this clone() is necessary? Think my conceptual understanding of the autograd engine is letting me down here.

Many thanks,


Some operations need intermediary results to be saved during the forward pass in order to execute the backward pass. For example, the function lambda x: x.pow(2) needs to save the input x because it is needed to compute the gradient with respect to x.

When you call the forward of such operation, the autograd engine does not perform a copy of your tensor x, but something akin to a pointer to your tensor x.

If after performing this operation (but before calling backwards), x happens to be modified inplace, then we’d lose the initial value of x and the gradient calculation would be incorrect.

Hope this clarifies.

@Varal7 thanks, I get that part conceptually now, but this does bring two questions to my mind.

The first is: if I use .clone() to remedy this issue, does this invalidate the backpropagation?

Secondly (and this one is more of a longshot), do you have any idea why this would be the case in my code? I am essentially populating y_pred[:, i] (which is a vector) one element at a time using the loop in j. From what I can see I don’t touch this vector after it is populated, so I can’t see why it is being modified after loop in i iterates?

Many thanks,

Using .clone() does not invalidate backprop. The tensor you obtain is a brand new tensor whose storage is disconnected from the original tensor but still remembers (through the graph) how it was computed.

I skimmed through your code, I think you want to use a list of list of list for y_pred and only at the end, turn it into a tensor before returning.