Pytorch gradients not being calculated with torch.nn.parameter variable

Gul_Zain · July 14, 2022, 5:22pm

Hello,
I am using code from “GitHub - BatsResearch/csp: Learning to compose soft prompts for compositional zero-shot learning.” repository. I want to update the tensor soft_embedding mentioned here: csp/csp.py at main · BatsResearch/csp · GitHub. This tensor is wrapped in torch.nn.parameter and is the only tensor passed to the optimizer (shown in this line: csp/csp.py at main · BatsResearch/csp · GitHub). I apply cross entropy loss on the logits and try to update the tensor soft_embedding through that. However, during training, i get None value whenever i print model.soft_embeddings.grad. My training sequence is shown below:

if image_extractor:
        image_extractor.train()
    model.train() # Let's switch to training

    train_loss = 0.0
    # for testing purposes
    count=0
    prev_soft_embed=copy.deepcopy(model.soft_embeddings)
    for idx, data in tqdm(enumerate(trainloader), total=len(trainloader), desc = 'Training'):
        for d in range(len(data)-1):
            data[d]=data[d].to(device)
        data[0] = model.encode_image(data[0])
        loss, _ = model(data)
        optimizer.zero_grad()
        loss.backward(retain_graph=True)
        print("grad printing")
        print(model.soft_embeddings.grad) # output here is none
        print("------------------------")
        optimizer.step()
        if torch.all(prev_soft_embed.eq(model.soft_embeddings)):
            print("no updates made") #i always get this print statement
        else:
            print("updates made")

        train_loss += loss.item()
    train_loss = train_loss/len(trainloader)
    writer.add_scalar('Loss/train_total', train_loss, epoch)
    print('Epoch: {}| Loss: {}'.format(epoch, round(train_loss, 2)))

I am not sure why this is the case. Requires_grad is also set to True. Looking forward to suggestions.

Gul_Zain · July 14, 2022, 5:47pm

It is a rookie mistake but leaving the answer down here. A for loop I believe was breaking the computation graph. Following was the for loop.

for pair_index in range(len(pairs)):
        tokens[class_indices]=self.construct_token_tensors(pair_index)

changed it to following

pair_indices=list(range(len(pairs)))
tokens=self.construct_token_tensors(pair_indices)

Whoever has this issue. First look and see if the computation graph is being broken. Avoid for loops and using numpy arrays and then converting them to tensors.