Having problem with In place operation

milhouse74 · February 26, 2020, 8:18pm

Good afternoon everyone,

I am new to PyTorch and I am having trouble with my code. The problem is that it seems that the operation that I am using is inplace, but I cant seem to understand how to fix it.

        # Create a tensor to store outputs during the Forward
        logits = torch.zeros(self.seq_len, self.batch_size, self.vocab_size).to(device)

        # initialize tensors needed for hidden calculation
        reset = torch.zeros(self.num_layers, self.batch_size, self.hidden_size)
        forget = torch.zeros(self.num_layers, self.batch_size, self.hidden_size)
        memory_temp = torch.zeros(self.num_layers, self.batch_size, self.hidden_size)
        one_matrix = torch.zeros(self.num_layers, self.batch_size, self.hidden_size)

        # https: // discuss.pytorch.org / t / how - to - copy - a - variable - in -a - network - graph / 1603 / 6
        # For each time step
        # https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/rnn.py
        for timestep in range(self.seq_len):
            # Apply dropout on the embedding result
            input_ = self.dropout(embed_out[timestep])
            for layer in range(self.num_layers):
                # operations
                reset[layer] = torch.sigmoid(self.r[layer](torch.cat([input_, hidden[layer]], 1)))
                forget[layer] = torch.sigmoid(self.z[layer](torch.cat([input_, hidden[layer]], 1)))
                memory_temp[layer] = torch.tanh(self.h[layer](torch.cat([input_, torch.mul(reset[layer], one_matrix[layer])], 1)))
                hidden[layer] = torch.mul((one_matrix[layer]-forget[layer]), one_matrix[layer]) + torch.mul(reset[layer], memory_temp[layer])

                # Apply dropout on this layer, but not for the recurrent units
                input_ = self.dropout(hidden[layer])

            # Store the output of the time step
            logits[timestep] = self.out_layer(input_)

The error message is as follow:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [20, 200]], which is output 0 of SelectBackward, is at version 70; expected version 69 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

Thanks a lot for helping me out!

Francois

albanD · February 26, 2020, 8:20pm

Hi,

The inplace operation happen when you do reset[layer] = XXX. This writes s subset of the Tensor inplace.
You can avoid doing this by just having reset being a list.
And if later you need to convert the content of that list into a single Tensor, you can do torch.stack(reset, dim=0).

milhouse74 · February 26, 2020, 8:35pm

Thanks Alban for the quick answer!

It solves some of my problems, that is perfect. I did some changes to my code to fix this. However, I know that I am still doing an in-place operation on the memory and/or hidden line of code. I dont have a clue of how to fix it.

# operations
reset = torch.sigmoid(self.r[layer](torch.cat([input_, hidden[layer]], 1)))
forget = torch.sigmoid(self.z[layer](torch.cat([input_, hidden[layer]], 1)))
memory_temp = torch.tanh(self.h[layer](torch.cat([input_, torch.mul(reset, hidden[layer])], 1)))
hidden[layer] = torch.mul((one_matrix-forget), hidden[layer]) + torch.mul(reset, memory_temp)

Would the solution to have the formula of memory_temp and hidden[layer] be dependant on the clone() of hidden would make sense? I did clone just before those operation and it seems to work. However, would that still work with backprop?

Thanks again for your help!

albanD · February 26, 2020, 10:03pm

If the autograd does not raise an error, then the backprop will work

Here, you can have hidden to be a list and just append to it no?