Update multiple models in each batch: modified by an inplace operation Error

I try to train multiple models within each batch, since the loss functions are based on the result of all models. Each model has their own dataset and optimizer. However, when I try to do backward() on the second model, “modified by an inplace operation” error occurs. I do set detect_anomaly to True, but the returned traceback makes no sense to me. Here is the code:

    for epoch in range(num_epochs):
        for i in range(len(splitted_v1)):
            X1_ = model_v1(splitted_v1[i].float())
            S1 = similarity_matrix(X1_)

            X2_ = model_v2(splitted_v2[i].float())
            S2 = similarity_matrix(X2_)

            local_s = (S1 + S2) / 2

            loss_v1 = custom_loss(X1_, splitted_v1[i], S1, local_s)
            optimizer_v1.zero_grad()
            loss_v1.backward(retain_graph=True)
            optimizer_v1.step()

            loss_v2 = custom_loss(X2_, splitted_v2[i], S2, local_s)
            optimizer_v2.zero_grad()
            loss_v2.backward(retain_graph=True)
            optimizer_v2.step()

Here is the traceback:

Blockquote
Warning: Error detected in AddmmBackward. Traceback of forward call that caused the error:
File “/Users/xx/Desktop/cancer_research/synthetic_test.py”, line 336, in
X1_ = model_v1(splitted_v1[i].float())
File “/Users/xx/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 550, in call
result = self.forward(*input, **kwargs)
File “/Users/xx/Desktop/cancer_research/synthetic_test.py”, line 102, in forward
x = self.decoder(x)
File “/Users/xx/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 550, in call
result = self.forward(*input, **kwargs)
File “/Users/xx/anaconda3/lib/python3.7/site-packages/torch/nn/modules/container.py”, line 100, in forward
input = module(input)
File “/Users/xx/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 550, in call
result = self.forward(*input, **kwargs)
File “/Users/xx/anaconda3/lib/python3.7/site-packages/torch/nn/modules/linear.py”, line 87, in forward
return F.linear(input, self.weight, self.bias)
File “/Users/xx/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py”, line 1610, in linear
ret = torch.addmm(bias, input, weight.t())
(print_stack at /Users/distiller/project/conda/conda-bld/pytorch_1587428061935/work/torch/csrc/autograd/python_anomaly_mode.cpp:60)
Traceback (most recent call last):
File “/Users/xx/Desktop/cancer_research/synthetic_test.py”, line 351, in
loss_v2.backward(retain_graph=True)
File “/Users/xx/anaconda3/lib/python3.7/site-packages/torch/tensor.py”, line 198, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File “/Users/xx/anaconda3/lib/python3.7/site-packages/torch/autograd/init.py”, line 100, in backward
allow_unreachable=True) # allow_unreachable flag
Blockquote

How the call on model_v1 influences the variable of model_v2.

I am new to pytorch. Any help would be appreciated!

Are the model updates working separately and is the issue thus raise only in the current setup?
Also, is similarity_matrix or a model manipulating some tensors inplace?

1 Like

Thank you for the replay!
Yes. They work fine separately during the initialization step, and I don’t think I use in-place operation in the similarity_matrix() and custom_loss().
I initialize the models on their own like this:

    print("initializing model_v1")
    for epoch in range(5):

        for i in range(len(splitted_v1)):
            train = splitted_v1[i]
            # ===================forward=====================
            output = model_v1(train.float())
            loss = torch.norm(output-train, p='fro')
            # ===================backward====================
            optimizer_v1.zero_grad()
            loss.backward()
            optimizer_v1.step()
        # ===================log========================
        print('epoch [{}/{}], loss:{:.4f}'
              .format(epoch + 1, 5, loss))
        lr_scheduler_v1.step()

    print("initializing model_v2")
    for epoch in range(5):

        for i in range(len(splitted_v2)):
            train = splitted_v2[i]
            # ===================forward=====================
            output = model_v2(train.float())
            loss = torch.norm(output - train, p='fro')
            # ===================backward====================
            optimizer_v2.zero_grad()
            loss.backward()
            optimizer_v2.step()
        # ===================log========================
        print('epoch [{}/{}], loss:{:.4f}'
              .format(epoch + 1, 5, loss))
        lr_scheduler_v2.step()

Here is similarity_matrix:

def similarity_matrix(y_pred):
    # S = [\hatX_k \dot \hatX_k^T / p_k]
    s = torch.mm(y_pred, y_pred.t())/y_pred.size()[1]
    return s

Here is custom_loss:

def custom_loss(y_pred, y_true, s_v, local_s):
    S = similarity_matrix(y_pred)
    loss = torch.tensor(0, requires_grad=True, dtype=float)
    for i in range(S.size(0)):
        x = S[i] + sys.float_info.epsilon
        temp = x * torch.log(x) + (1 - x) * torch.log(1 - x)
        loss = loss - temp.sum()

    loss3 = loss/y_pred.size(0)
    # frobenius norm || \hatX_k - X_k ||^2_F
    loss1 = torch.norm(y_pred-y_true, p='fro')

    loss2 = torch.norm(s_v-local_s, p='fro')
    total_loss = loss1 + loss2 + loss3
    return total_loss

Thanks for the code. Sorry, but I cannot find the error.
Could you post a minimal, executable code snippet using random input tensors, so that we could debug it, please?

1 Like

I solve this problem. There was nothing wrong in the code, and pytorch and torchvision were up to date. I thought it may be caused by incompatibilities between packages, so I reinstall anaconda and its environment. Then it works. Thank you very much!