Having problem with In place operation

Good afternoon everyone,

I am new to PyTorch and I am having trouble with my code. The problem is that it seems that the operation that I am using is inplace, but I cant seem to understand how to fix it.

        # Create a tensor to store outputs during the Forward
        logits = torch.zeros(self.seq_len, self.batch_size, self.vocab_size).to(device)

        # initialize tensors needed for hidden calculation
        reset = torch.zeros(self.num_layers, self.batch_size, self.hidden_size)
        forget = torch.zeros(self.num_layers, self.batch_size, self.hidden_size)
        memory_temp = torch.zeros(self.num_layers, self.batch_size, self.hidden_size)
        one_matrix = torch.zeros(self.num_layers, self.batch_size, self.hidden_size)

        # https: // discuss.pytorch.org / t / how - to - copy - a - variable - in -a - network - graph / 1603 / 6
        # For each time step
        # https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/rnn.py
        for timestep in range(self.seq_len):
            # Apply dropout on the embedding result
            input_ = self.dropout(embed_out[timestep])
            for layer in range(self.num_layers):
                # operations
                reset[layer] = torch.sigmoid(self.r[layer](torch.cat([input_, hidden[layer]], 1)))
                forget[layer] = torch.sigmoid(self.z[layer](torch.cat([input_, hidden[layer]], 1)))
                memory_temp[layer] = torch.tanh(self.h[layer](torch.cat([input_, torch.mul(reset[layer], one_matrix[layer])], 1)))
                hidden[layer] = torch.mul((one_matrix[layer]-forget[layer]), one_matrix[layer]) + torch.mul(reset[layer], memory_temp[layer])

                # Apply dropout on this layer, but not for the recurrent units
                input_ = self.dropout(hidden[layer])

            # Store the output of the time step
            logits[timestep] = self.out_layer(input_)

The error message is as follow:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [20, 200]], which is output 0 of SelectBackward, is at version 70; expected version 69 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

Thanks a lot for helping me out!



The inplace operation happen when you do reset[layer] = XXX. This writes s subset of the Tensor inplace.
You can avoid doing this by just having reset being a list.
And if later you need to convert the content of that list into a single Tensor, you can do torch.stack(reset, dim=0).

Thanks Alban for the quick answer!

It solves some of my problems, that is perfect. I did some changes to my code to fix this. However, I know that I am still doing an in-place operation on the memory and/or hidden line of code. I dont have a clue of how to fix it.

# operations
reset = torch.sigmoid(self.r[layer](torch.cat([input_, hidden[layer]], 1)))
forget = torch.sigmoid(self.z[layer](torch.cat([input_, hidden[layer]], 1)))
memory_temp = torch.tanh(self.h[layer](torch.cat([input_, torch.mul(reset, hidden[layer])], 1)))
hidden[layer] = torch.mul((one_matrix-forget), hidden[layer]) + torch.mul(reset, memory_temp)

Would the solution to have the formula of memory_temp and hidden[layer] be dependant on the clone() of hidden would make sense? I did clone just before those operation and it seems to work. However, would that still work with backprop?

Thanks again for your help!

If the autograd does not raise an error, then the backprop will work :slight_smile:

Here, you can have hidden to be a list and just append to it no?