In-place operations can be only used on variables that don't share storage with any other variables, but detected that there are 2 objects sharing it

This is the sinippy

    @staticmethod
    def backward(ctx, grad_output):
        ind_lst = ctx.ind_lst
        flag = ctx.flag

        c = grad_output.size(1)
        grad_former_all = grad_output[:, 0:c//3, :, :]
        grad_latter_all = grad_output[:, c//3: c*2//3, :, :]
        grad_swapped_all = grad_output[:, c*2//3:c, :, :]

        spatial_size = ctx.h * ctx.w

        W_mat_all = Variable(ctx.Tensor(ctx.bz, spatial_size, spatial_size).zero_())
        for idx in range(ctx.bz):
            W_mat = W_mat_all.narrow(0,idx, 1).squeeze()
            for cnt in range(spatial_size):
                indS = ind_lst[idx][cnt] 

                if flag[cnt] == 1:
                    W_mat[cnt, indS] = 1

            W_mat_t = W_mat.t()

            grad_swapped_weighted = torch.mm(W_mat_t, grad_swapped_all[idx].view(c//3, -1).t())
            grad_swapped_weighted = grad_swapped_weighted.t().contiguous().view(1, c//3, ctx.h, ctx.w)
            grad_latter_all[idx] = torch.add(grad_latter_all[idx], grad_swapped_weighted.mul(ctx.triple_w))

I wonder which two variables share the storage.

Hi,

FYI, you can replace: W_mat_all.narrow(0,idx, 1).squeeze() by W_mat_all.select(0,idx).
Also narrow (and select) are inplace operation: the returned tensor share it’s storage with the original. This means that W_mat_all and W_mat share the same storage.
So when you try to modify W_mat inplace later by doing W_mat[cnt, indS] = 1, the storage is shared by 2 tensors already.

Hi,
Thank you for your reply, I wonder if there is a comprmise to solve it.

If you don’t want the changes of W_mat to be reflected on W_mat_all, then you can just .clone() just after the select. The gradients will flow back properly through the clone operation.
If you do want these changes to be reflected, you can work with W_mat_all[idx, cnt, indS] = 1 directly.

In fact, W_mat_all is just for calculating the approximate grad_input. The value or the gradient w.r.t W_mat_all is useless. So I try to do as you told:

    @staticmethod
    def backward(ctx, grad_output):
        ind_lst = ctx.ind_lst
        flag = ctx.flag

        c = grad_output.size(1)
        grad_former_all = grad_output[:, 0:c//3, :, :]
        grad_latter_all = grad_output[:, c//3: c*2//3, :, :]
        grad_swapped_all = grad_output[:, c*2//3:c, :, :]

        spatial_size = ctx.h * ctx.w

        W_mat_all = Variable(ctx.Tensor(ctx.bz, spatial_size, spatial_size).zero_())
        for idx in range(ctx.bz):
            # use clone as you told
            W_mat = W_mat_all.select(0,idx).clone()
            for cnt in range(spatial_size):
                indS = ind_lst[idx][cnt]

                if flag[cnt] == 1:
                    W_mat[cnt, indS] = 1

            W_mat_t = W_mat.t()

            grad_swapped_weighted = torch.mm(W_mat_t, grad_swapped_all[idx].view(c//3, -1).t())
            grad_swapped_weighted = grad_swapped_weighted.t().contiguous().view(1, c//3, ctx.h, ctx.w)

            # If I delete this line, it works fine. Otherwise, it errors.
            grad_latter_all[idx] = torch.add(grad_latter_all[idx], grad_swapped_weighted.mul(ctx.triple_w))

When it gives the error in-place operations can be only used on variables that don't share storage with any other variables, but detected that there are 4 objects sharing it

It is confusing, which 4 objects?

I would guess it’s when you try and modify grad_latter_all inplace when doing grad_latter_all[idx] = ....
This share it’s storage with grad_output and all the slices of grad_output that you create.

When I add clone() to these two Variables, it works fine.

        grad_latter_all = grad_output[:, c//3: c*2//3, :, :].clone()
        grad_swapped_all = grad_output[:, c*2//3:c, :, :].clone()

Thank you for you help.