Having trouble with autograd and slicing

Jase · August 3, 2021, 12:50am

Hi all,

I’m quite new to torch and am struggling to understand why my code here is giving an error. I’m working on a seq2seq type model, here is the relevant code

    def forward(self, x, y, is_eval=False, weighted=False):
        batch_size = x.size(0)
        n = x.size(1)
        y_pred = torch.zeros_like(y).cuda()
        encoder_outputs, _ = self.encoder(self.embed(x))
        decoder_outputs = torch.zeros(batch_size, n, self.decoder.output_size).cuda()
        dec_h = None

        for i in range(n):
            # Update the decoder
            embedded = self.embed(y_pred[:, i-1:i]) if i > 0 else torch.ones_like(y_pred[:, 0:1])
            decoder_outputs[:, i:i+1, :], dec_h = self.decoder(embedded, dec_h)
            prev_edge = torch.ones((batch_size, 1, 1)).cuda()  # This tracks s^(t)_i,j-1

            edge_h = None
            for j in range(i+1):
                theta, edge_h = self.edge_level(prev_edge, edge_h)
                edge_mlp_input = torch.cat((x[:, i:i+1, j:j+1],
                                            theta,
                                            encoder_outputs[:, i:i+1, :],
                                            decoder_outputs[:, i:i+1, :]), dim=2)

                edge_prob = self.activation(self.edge_mlp(edge_mlp_input))
                if is_eval: # sample the sigmoid
                    y_pred[:, i:i+1, j:j+1] = sample_vector(edge_prob) if not weighted else edge_prob
                    prev_edge = y_pred[:, i:i+1, j:j+1]
                else:
                    y_pred[:, i:i+1, j:j+1] = edge_prob
                    prev_edge = edge_prob
        return y_pred

The Encoder and Decoder are just standard GRU RNNs from torch.

I am training with a very standard training loop you would find on any of the torch examples (hence the error is not coming from the is_eval section of the fwd pass), and I get the following error:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [180, 1, 14]], which is output 0 of SliceBackward, is at version 106; expected version 92 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

The Tensor it is referring to based on the sizes is the y_pred[:, i-1] tensor, but I can’t understand why what I am doing is causing an issue as it doesn’t appear to me that I am modifying it, but maybe there’s something I haven’t grasped here?

Many thanks,
Jase

Jase · August 3, 2021, 2:05pm

Just as an update: I have managed to fix this issue with the following line

embedded = self.embed(y_pred[:, i-1:i].clone()) if i > 0 else torch.ones_like(y_pred[:, 0:1])

by looking at similar issues online. But I was wondering if someone could explain why this clone() is necessary? Think my conceptual understanding of the autograd engine is letting me down here.

Many thanks,
Jase

Varal7 · August 3, 2021, 3:10pm

Hi,

Some operations need intermediary results to be saved during the forward pass in order to execute the backward pass. For example, the function lambda x: x.pow(2) needs to save the input x because it is needed to compute the gradient with respect to x.

When you call the forward of such operation, the autograd engine does not perform a copy of your tensor x, but something akin to a pointer to your tensor x.

If after performing this operation (but before calling backwards), x happens to be modified inplace, then we’d lose the initial value of x and the gradient calculation would be incorrect.

Hope this clarifies.

Jase · August 3, 2021, 3:47pm

@Varal7 thanks, I get that part conceptually now, but this does bring two questions to my mind.

The first is: if I use .clone() to remedy this issue, does this invalidate the backpropagation?

Secondly (and this one is more of a longshot), do you have any idea why this would be the case in my code? I am essentially populating y_pred[:, i] (which is a vector) one element at a time using the loop in j. From what I can see I don’t touch this vector after it is populated, so I can’t see why it is being modified after loop in i iterates?

Many thanks,
Jase

Varal7 · August 3, 2021, 4:16pm

Using .clone() does not invalidate backprop. The tensor you obtain is a brand new tensor whose storage is disconnected from the original tensor but still remembers (through the graph) how it was computed.

I skimmed through your code, I think you want to use a list of list of list for y_pred and only at the end, turn it into a tensor before returning.