Error in backward when accumulating loss

Thiago_Oliveira · November 5, 2018, 3:41am

I’m adding multiple loss and get the error
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation
Here is my Code

import sys
import torch
import torch.nn as nn
from torch.autograd import Variable

torch.manual_seed(777)  # reproducibility
#            0    1    2    3    4
idx2char = ['h', 'i', 'e', 'l', 'o']

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

# Teach hihell -> ihello
x_data = [0, 1, 0, 2, 3, 3]   # hihell
one_hot_lookup = [[1, 0, 0, 0, 0],  # 0
                  [0, 1, 0, 0, 0],  # 1
                  [0, 0, 1, 0, 0],  # 2
                  [0, 0, 0, 1, 0],  # 3
                  [0, 0, 0, 0, 1]]  # 4

y_data = [1, 0, 2, 3, 3, 4]    # ihello
x_one_hot = [one_hot_lookup[x] for x in x_data]

# As we have one batch of samples, we will change them to variables only once
inputs = Variable(torch.Tensor(x_one_hot)).to(device)
labels = Variable(torch.LongTensor(y_data)).to(device)

num_classes = 5
input_size = 5  # one-hot size
hidden_size = 5  # output from the RNN. 5 to directly predict one-hot
batch_size = 1   # one sentence
sequence_length = 1  # One by one
num_layers = 1  # one-layer rnn


class Model(nn.Module):

    def __init__(self):
        super(Model, self).__init__()
        self.rnn = nn.RNN(input_size=input_size,
                          hidden_size=hidden_size, batch_first=True).to(device)

    def forward(self, hidden, x):
        # Reshape input (batch first)
        x = x.view(batch_size, sequence_length, input_size)

        # Propagate input through RNN
        # Input: (batch, seq_len, input_size)
        # hidden: (num_layers * num_directions, batch, hidden_size)
        out, hidden = self.rnn(x, hidden)
        return hidden, out.view(-1, num_classes)

    def init_hidden(self):
        # Initialize hidden and cell states
        # (num_layers * num_directions, batch, hidden_size)
        return Variable(torch.zeros(num_layers, batch_size, hidden_size)).to(device)


# Instantiate RNN model
model = Model().to(device)
print(model)

# Set loss and optimizer function
# CrossEntropyLoss = LogSoftmax + NLLLoss
criterion = nn.CrossEntropyLoss(reduction='sum').to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.1)

# Train the model
for epoch in range(100):
    optimizer.zero_grad()
    loss = 0
    hidden = model.init_hidden()

    sys.stdout.write("predicted string: ")
    for input, label in zip(inputs, labels):
        #print(input.size(), label.size())
        hidden, output = model(hidden, input)
        val, idx = output.max(1)
        sys.stdout.write(idx2char[idx.data[0]])
        label = label.unsqueeze_(dim=0)
        loss = loss + criterion(output, label) # loss accumulating

    loss.backward() # error
    print(", epoch: %d, loss: %1.3f" % (epoch + 1, loss.item()))
    optimizer.step()

print("Learning finished!")

Is there any way to sum the losses and backpropagate just in the end of sequence?

marcman411 · November 5, 2018, 4:47am

The error means that one of the tensors that require a gradient has been modified in place. It’s the result of an in-place correctness check.

If you change label = label.unsqueeze_(dim=0) to label = label.unsqueeze(dim=0), it should work.

unsqueeze_ is an in-place operation and label is used to compute the loss which you call backward() on. Hence, the error.

As a rule, you cannot use in-place operations on tensors required for gradient computation.

(As an aside, label = label.unsqueeze_(dim=0) is redundant. Because it’s an in-place operation, label.unsqueeze_(dim=0) would give the same result without the assignment. Also, Variable is deprecated. Tensors can be used on their own now. See the 0.4 update migration guide.)