Gradient problem - variable modified, cannot find where

I have received this error:

one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [4754, 300]] is at version 4; expected version 3 instead.

when I call loss.backward() on the output from customNegativeLoss.

My code:

def customNegativeLoss(self, embedInRes, targets):
        batchSize = targets.shape[0]
        wordsCount = targets.shape[1]

        embedInRes = embedInRes.view(batchSize * wordsCount, -1)
        targets = targets.view(batchSize * wordsCount)

        emb_context = self.embedOut(targets)
       
        emb_product = torch.mul(embedInRes, emb_context)
        emb_product = torch.sum(emb_product, dim=1) 
       
        pos_loss = F.logsigmoid(emb_product) 

        if (self.negativeSamples == -1):      
            #this works          
            return -(pos_loss).mean()
                
        noise_dist = torch.ones(self.vocabSize).to(targets.device)  
       
        num_neg_samples_for_this_batch = (batchSize * wordsCount) * self.negativeSamples
        negative_example = torch.multinomial(noise_dist, num_neg_samples_for_this_batch, replacement = True) 
            
        negative_example = negative_example.view(batchSize * wordsCount, self.negativeSamples)
            
        emb_negative = self.embedOut(negative_example)

        emb_product_neg_samples = torch.bmm(emb_negative.neg(), embedInRes.unsqueeze(2)) 
            
        noise_loss = F.logsigmoid(emb_product_neg_samples).squeeze(2)
        noise_loss = noise_loss.sum(1)
          
        total_loss = -(pos_loss + noise_loss).mean()
            
        #this crashes
        return total_loss

Embedings are inited as:

 self.embedIn = nn.Embedding(
        num_embeddings=self.vocabSize,
        embedding_dim=300,
        padding_idx=paddingIdx, #index of padding
        max_norm=1,
        sparse=True
    )

    self.embedOut = nn.Embedding(
        num_embeddings=self.vocabSize,
        embedding_dim=300,
        padding_idx=paddingIdx, #index of padding
        max_norm=1,
        sparse=True
    )

if self.negativeSamples is -1, the backward pass is working. So the error must be in the “else” branch, but I am unsure where. The code is for Word2Vec training with Skip-Grams (based on n0obcoder · GitHub).

Could you post a minima, executable code snippet to reproduce the issue, please?

The problem was in Embedding.

sparse=True

and in used optimizer torch.optim.SparseAdam.

If I set sparse to False and use classic AdamW, the problem is gone