Autograd error: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior

for the calculation of classification loss on the augmented dataset

define calc_loss_aug

def calc_loss_aug(input, labels, bart_model, classifier, max_length = 50):
# get the tokenizers
bart_tokenizer = bart_model.tokenizer

classifier_tokenizer = classifier.tokenizer

# convert input to the bart encodings

sentence = [classifier_tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=False) for g in input]

encoding = bart_tokenizer(sentence, return_tensors='pt', padding=True)

input_ids_bart = torch.nn.Parameter(encoding['input_ids'].float())

attention_mask_bart = torch.nn.Parameter(encoding['attention_mask'].float())

bart_logits = bart_model(input_ids_bart.long(), attention_mask_bart.long(),
 target_ids = input_ids_bart[:,:max_length].long(), decoder_attention_mask = attention_mask_bart[:,:max_length].long()).logits

# find the decoded vector from probabilities
[ _ , summary_ids] = bart_logits.max(dim = -1)

# classifier
out = [bart_tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=False) for g in summary_ids]

encoding = classifier_tokenizer(out, return_tensors='pt', padding=True)

input_ids_classifier = torch.nn.Parameter(encoding['input_ids'].float())

attention_mask_classifier = torch.nn.Parameter(encoding['attention_mask'].float())

loss = classifier(input_ids_classifier.long(), attention_mask_classifier.long(), labels = labels).loss

return loss

When I run

vector_t_dash = torch.autograd.grad(loss, bart_model.bart_model.model.encoder.parameters(), retain_graph = True),

I am getting the following error, and I have no idea how to solve it?

RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.

Can anyone help to solve it?

Hi,

If you expect that some of the parameters in the model are not used, then you can pass allow_unused=True as suggested in the error message.
Otherwise you want to make sure that you only do differentiable ops with Tensors that can require gradients. In particular, only floating point Tensors can require gradients

How do I pass a tensor output which is with respect to one tokenizer(BART) to another tokenizer(XLNet) without converting it to string because this is where the model is turning out to be non-differentiable?

I have output from the BART model which has to be passed to the XL-Net model. But to do so I have to change the output from BART to the format needed by XL-Net, so I have to convert it to string and pass it through an XL-Net tokenizer. However, when using autograd w.r.t BART model it is becoming a non-differentiable unit.

A way around will be helpful.

Hi,

If your op is not differentiable, there isn’t much we can do here.
If for your particular use case, you want to specify a special backward for that part that is not differentiable, you can do so with a custom Function: Extending PyTorch — PyTorch 1.7.1 documentation

Hey, Thank you! This is useful.

I found a simple way around for my task, and I have to use argmax in the loss function.
However, I need to find a way to include a differentiable argmax operator. Can you help me in this regard?

Thank you again :slight_smile:

Well it is not differentiable either so you either have to use a “soft” version of it if you can like softmax. Or a custom function as I mentionned above.